Email : yanis.zirem@univ-lille.fr | Classe
: M1 Bioinformatique - parcours OSB et MISO
R
: 4.4.x | ️ RStudio : 2024.09+ |
Bioconductor : 3.20
Le cancer colorectal est l’un des cancers les plus fréquents dans le monde :
Statistiques mondiales (2020) : - 3ème cancer le plus diagnostiqué (1.9 million de cas/an) - 2ème cause de décès par cancer (~900,000 décès/an) - Incidence croissante chez les jeunes (<50 ans) - Survie à 5 ans : 65% (tous stades), 90% (stade I), 14% (stade IV)
Le CRC se développe sur 10-15 ans selon ce modèle:
Épithélium normal
↓ Mutation APC
Polype bénin (adénome)
↓ Mutation KRAS
Adénome dysplasique
↓ Mutation TP53, SMAD4
Carcinome in situ
↓ Invasion stroma
Carcinome invasif
↓ Dissémination
Métastases (foie, poumon)
| Gène/Voie | Type | Fréquence | Fonction | Impact |
|---|---|---|---|---|
| APC | Suppresseur | 80% | Wnt/β-catenin | Initiation |
| KRAS | Oncogène | 40% | MAPK signaling | Progression |
| TP53 | Suppresseur | 50% | Apoptose, cycle | Progression |
| PIK3CA | Oncogène | 15% | PI3K/Akt/mTOR | Prolifération |
| SMAD4 | Suppresseur | 10% | TGF-β | Invasion |
| BRAF | Oncogène | 10% | MAPK (V600E) | Mauvais pronostic |
L’analyse RNA-seq du CRC permet de :
✅ Identifier les gènes dérégulés (DEG)
✅ Classifier les tumeurs en sous-types moléculaires
(CMS1-4)
✅ Découvrir de nouveaux biomarqueurs
diagnostiques
✅ Prédire la réponse thérapeutique
✅ Proposer des cibles thérapeutiques
TCGA (2006-2018) : Plus grand effort de caractérisation moléculaire du cancer
Projet : TCGA Colon Adenocarcinoma (TCGA-COAD)
Technologie :
- RNA-seq Illumina HiSeq 2000/2500
- Paired-end 2×75bp ou 2×100bp
- Alignement : STAR v2.7
- Quantification : HTSeq-Counts
Échantillons :
- ~480 échantillons au total dans TCGA - Pour
ce TD : Sous-ensemble filtré et pré-traité - Tumeurs primaires
(Primary Solid Tumor) - Tissus adjacents normaux (Solid Tissue
Normal)
Gènes :
- ~60,000 features (ensemble Ensembl) - ~20,000
gènes codants (protein-coding)
The Cancer Genome Atlas Network. (2012).
Comprehensive molecular characterization of human colon and rectal cancer.
Nature, 487(7407), 330-337.
DOI: 10.1038/nature11252 Xena dataset : [https://xenabrowser.net/datapages/?dataset=TCGA.COAD.sampleMap/HiSeqV2&host=https%3A%2F%2Ftcga.xenahubs.net]
Principaux Résultats TCGA-COAD : - 16% des tumeurs : hypermutées (MSI-H, POLE mutations) - 84% des tumeurs : non-hypermutées (chromosomal instability) - Voie WNT dérégulée dans 93% des cas - Identification de 4 sous-types CMS (Consensus Molecular Subtypes)
À la fin de ce TD, vous serez capable de :
** COMPÉTENCES TECHNIQUES**
** COMPÉTENCES BIOLOGIQUES**
** CONSEIL** : Exécutez cette section UNE SEULE FOIS
sur votre ordinateur.
Si les packages sont déjà installés, passez directement au
chargement.
# Installer BiocManager (gestionnaire de packages Bioconductor)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# BIOCONDUCTOR
BiocManager::install(c(
"DESeq2", # Analyse différentielle
"org.Hs.eg.db", # Annotations Homo sapiens
"AnnotationDbi", # Interface annotations
"clusterProfiler", # Enrichissement GO, KEGG
"enrichplot", # Visualisations enrichissement
"ReactomePA", # Reactome pathways
"EnhancedVolcano" # Volcano plots
))
# CRAN
install.packages(c(
"tidyverse", # ggplot2, dplyr, etc.
"pheatmap", # Heatmaps
"patchwork", # Combiner plots
"plotly", # Plots interactifs
"viridis", # Palettes de couleurs
"dendextend" # Dendrogrammes colorés
))# CHARGEMENT DES LIBRARIES
# Manipulation de données
library(tidyverse) # ggplot2, dplyr, tidyr, etc.
# Analyse RNA-seq
library(DESeq2) # Analyse différentielle
library(EnhancedVolcano)
# Annotations
library(org.Hs.eg.db) # Annotations humaines
library(AnnotationDbi) # Interface
# Enrichissement
library(clusterProfiler) # GO, KEGG enrichment
library(enrichplot) # Visualisations
library(ReactomePA) # Reactome pathways
# Visualisations
library(pheatmap) # Heatmaps
library(patchwork)
# Couleurs
library(RColorBrewer) # Palettes
library(viridis) # Palettes accessibles
# PCA
library(FactoMineR) # PCA avancée
library(factoextra) # Visualisation PCA
# Paramètres ggplot par défaut
theme_set(theme_minimal(base_size = 14))
# Désactiver notation scientifique
options(scipen = 999)
cat("les packages chargés avec succès!\n")#> les packages chargés avec succès!
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 26100)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=French_France.utf8 LC_CTYPE=French_France.utf8
#> [3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C
#> [5] LC_TIME=French_France.utf8
#>
#> time zone: Europe/Paris
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] factoextra_1.0.7 FactoMineR_2.11
#> [3] viridis_0.6.5 viridisLite_0.4.3
#> [5] RColorBrewer_1.1-3 patchwork_1.3.2
#> [7] pheatmap_1.0.13 ReactomePA_1.46.0
#> [9] enrichplot_1.22.0 clusterProfiler_4.10.1
#> [11] org.Hs.eg.db_3.18.0 AnnotationDbi_1.64.1
#> [13] EnhancedVolcano_1.20.0 ggrepel_0.9.6
#> [15] DESeq2_1.42.1 SummarizedExperiment_1.32.0
#> [17] Biobase_2.62.0 MatrixGenerics_1.14.0
#> [19] matrixStats_1.5.0 GenomicRanges_1.54.1
#> [21] GenomeInfoDb_1.38.8 IRanges_2.36.0
#> [23] S4Vectors_0.40.2 BiocGenerics_0.48.1
#> [25] lubridate_1.9.4 forcats_1.0.1
#> [27] stringr_1.6.0 dplyr_1.1.4
#> [29] purrr_1.0.4 readr_2.1.5
#> [31] tidyr_1.3.1 tibble_3.2.1
#> [33] ggplot2_4.0.2 tidyverse_2.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.18.0 jsonlite_2.0.0 magrittr_2.0.3
#> [4] estimability_1.5.1 farver_2.1.2 rmarkdown_2.30
#> [7] fs_1.6.5 zlibbioc_1.48.2 vctrs_0.6.5
#> [10] memoise_2.0.1 RCurl_1.98-1.17 ggtree_3.10.1
#> [13] htmltools_0.5.8.1 S4Arrays_1.2.1 SparseArray_1.2.4
#> [16] gridGraphics_0.5-1 sass_0.4.9 bslib_0.10.0
#> [19] htmlwidgets_1.6.4 plyr_1.8.9 emmeans_2.0.1
#> [22] cachem_1.1.0 uuid_1.2-1 igraph_2.1.4
#> [25] lifecycle_1.0.5 pkgconfig_2.0.3 gson_0.1.0
#> [28] Matrix_1.5-4.1 R6_2.6.1 fastmap_1.2.0
#> [31] GenomeInfoDbData_1.2.11 digest_0.6.37 aplot_0.2.9
#> [34] colorspace_2.1-1 RSQLite_2.3.9 timechange_0.3.0
#> [37] httr_1.4.8 polyclip_1.10-7 abind_1.4-8
#> [40] compiler_4.3.1 bit64_4.6.0-1 withr_3.0.2
#> [43] graphite_1.48.0 S7_0.2.0 BiocParallel_1.36.0
#> [46] DBI_1.2.3 ggforce_0.4.2 MASS_7.3-60
#> [49] rappdirs_0.3.3 DelayedArray_0.28.0 scatterplot3d_0.3-44
#> [52] HDO.db_0.99.1 flashClust_1.01-2 tools_4.3.1
#> [55] otel_0.2.0 scatterpie_0.2.6 ape_5.8-1
#> [58] glue_1.8.0 nlme_3.1-168 GOSemSim_2.28.1
#> [61] shadowtext_0.1.6 grid_4.3.1 cluster_2.1.8.1
#> [64] reshape2_1.4.4 fgsea_1.28.0 generics_0.1.4
#> [67] gtable_0.3.6 tzdb_0.5.0 data.table_1.17.0
#> [70] hms_1.1.4 tidygraph_1.3.1 XVector_0.42.0
#> [73] pillar_1.11.1 yulab.utils_0.2.4 splines_4.3.1
#> [76] tweenr_2.0.3 treeio_1.26.0 lattice_0.22-7
#> [79] bit_4.6.0 tidyselect_1.2.1 GO.db_3.18.0
#> [82] locfit_1.5-9.12 Biostrings_2.70.3 reactome.db_1.86.2
#> [85] knitr_1.51 gridExtra_2.3 xfun_0.52
#> [88] graphlayouts_1.2.2 DT_0.34.0 stringi_1.8.7
#> [91] lazyeval_0.2.2 ggfun_0.2.0 yaml_2.3.7
#> [94] evaluate_1.0.5 codetools_0.2-20 ggraph_2.2.1
#> [97] qvalue_2.34.0 graph_1.80.0 multcompView_0.1-11
#> [100] ggplotify_0.1.3 cli_3.6.4 xtable_1.8-4
#> [103] systemfonts_1.2.2 jquerylib_0.1.4 Rcpp_1.0.14
#> [106] png_0.1-8 parallel_4.3.1 leaps_3.2
#> [109] blob_1.3.0 DOSE_3.28.2 bitops_1.0-9
#> [112] mvtnorm_1.3-3 tidytree_0.4.7 ggiraph_0.8.13
#> [115] scales_1.4.0 crayon_1.5.3 rlang_1.1.5
#> [118] cowplot_1.2.0 fastmatch_1.1-6 KEGGREST_1.42.0
# Créer structure de dossiers
dir.create("results", showWarnings = FALSE)
dir.create("results/tables", showWarnings = FALSE, recursive = TRUE)
dir.create("results/enrichment", showWarnings = FALSE, recursive = TRUE)
dir.create("figures", showWarnings = FALSE)
dir.create("figures/QC", showWarnings = FALSE, recursive = TRUE)
dir.create("figures/DEG", showWarnings = FALSE, recursive = TRUE)
dir.create("figures/enrichment", showWarnings = FALSE, recursive = TRUE)
cat("Structure de dossiers créée:\n")#> Structure de dossiers créée:
#> results/
#> ├─ tables/ (tableaux CSV/Excel)
#> └─ enrichment/ (résultats GO/KEGG)
#> figures/
#> ├─ QC/ (PCA, clustering, distributions)
#> ├─ DEG/ (Volcano, MA, heatmaps)
#> └─ enrichment/ (dotplots, barplots, GSEA)
** FICHIERS REQUIS**
Assurez-vous d’avoir ces deux fichiers dans votre répertoire de
travail : - COAD_counts_matrix.rds : Matrice de comptages
(gènes × échantillons) - COAD_metadata.rds : Métadonnées
des échantillons
#> Chargement des données TCGA-COAD...
counts_matrix <- readRDS("COAD_counts_matrix.rds")
metadata <- readRDS("COAD_metadata.rds")
cat("Données chargées avec succès!\n\n")#> Données chargées avec succès!
#> === DIMENSIONS ===
#> Matrice de comptages: 16892 gènes × 327 échantillons
#> Métadonnées: 327 échantillons × 4 variables
#> === APERÇU MATRICE DE COMPTAGES ===
#>
#> Type de données: data.frame
#> Type de valeurs: numeric
# Convertir en matrice numérique
counts_mat <- as.matrix(counts_matrix)
# Statistiques globales
cat("Comptage minimum:", min(counts_mat), "\n")#> Comptage minimum: 0
#> Comptage maximum: 1,336,102
#> Comptage moyen: 1131.2
#> Comptage médian: 340
QUESTION 1a : Combien d’échantillons tumoraux et
normaux avons-nous?
(Répondre dans le document Word séparé)
#> === APERÇU MÉTADONNÉES ===
#>
#> === STRUCTURE ===
#> 'data.frame': 327 obs. of 4 variables:
#> $ sample_id : chr "TCGA-CA-5256-01" "TCGA-AZ-6599-01" "TCGA-AA-3655-01" "TCGA-A6-6137-01" ...
#> $ barcode : chr "TCGA-CA-5256-01" "TCGA-AZ-6599-01" "TCGA-AA-3655-01" "TCGA-A6-6137-01" ...
#> $ condition : chr "Tumor" "Tumor" "Tumor" "Tumor" ...
#> $ patient_id: chr "TCGA-CA-5256" "TCGA-AZ-6599" "TCGA-AA-3655" "TCGA-A6-6137" ...
#>
#> === RÉSUMÉ ===
#> sample_id barcode condition patient_id
#> Length:327 Length:327 Length:327 Length:327
#> Class :character Class :character Class :character Class :character
#> Mode :character Mode :character Mode :character Mode :character
# Compter les échantillons par groupe
table_groups <- table(metadata$condition)
cat("=== DISTRIBUTION DES GROUPES ===\n")#> === DISTRIBUTION DES GROUPES ===
#>
#> Normal Tumor
#> 41 286
#>
#> Pourcentages:
#>
#> Normal Tumor
#> 12.5 87.5
# Visualisation
barplot(table_groups,
col = c("steelblue", "coral"),
main = "Distribution des Échantillons",
ylab = "Nombre d'échantillons",
las = 1)QUESTION 1b : Pourquoi y a-t-il beaucoup plus de tumeurs que de normaux dans TCGA?
# Vérifier que les échantillons correspondent
samples_counts <- colnames(counts_matrix)
samples_metadata <- rownames(metadata)
cat("=== CORRESPONDANCE ÉCHANTILLONS ===\n")#> === CORRESPONDANCE ÉCHANTILLONS ===
#> Échantillons dans counts: 327
#> Échantillons dans metadata: 327
# Vérification stricte
all_match <- all(samples_counts %in% samples_metadata) &&
all(samples_metadata %in% samples_counts)
if(all_match) {
cat("Tous les échantillons correspondent parfaitement!\n")
} else {
cat(" Attention: Certains échantillons ne correspondent pas!\n")
# Identifier les différences
missing_in_metadata <- setdiff(samples_counts, samples_metadata)
missing_in_counts <- setdiff(samples_metadata, samples_counts)
if(length(missing_in_metadata) > 0) {
cat("Manquants dans metadata:", length(missing_in_metadata), "\n")
}
if(length(missing_in_counts) > 0) {
cat("Manquants dans counts:", length(missing_in_counts), "\n")
}
}#> Tous les échantillons correspondent parfaitement!
# Réordonner metadata pour correspondre exactement à counts
metadata <- metadata[colnames(counts_matrix), , drop = FALSE]
cat("\nMétadonnées réordonnées pour correspondre à la matrice de comptages\n")#>
#> Métadonnées réordonnées pour correspondre à la matrice de comptages
# Calculer les comptages totaux par échantillon
library_sizes <- colSums(counts_matrix)
# Créer un dataframe pour ggplot
df_lib_sizes <- data.frame(
Sample = names(library_sizes),
Library_Size = library_sizes,
Sample_Type = metadata$condition
)
# Statistiques
cat("=== LIBRARY SIZES ===\n")#> === LIBRARY SIZES ===
#> Minimum: 16,313,130 reads
#> Maximum: 26,263,500 reads
#> Moyenne: 19,108,199 reads
#> Médiane: 18,871,360 reads
#>
#> === PAR GROUPE ===
aggregate(Library_Size ~ Sample_Type, data = df_lib_sizes, FUN = function(x) {
c(Mean = mean(x), Median = median(x), SD = sd(x))
})### Visualisation: Boxplot
# Boxplot
p1 <- ggplot(df_lib_sizes, aes(x = Sample_Type, y = Library_Size, fill = Sample_Type)) +
geom_boxplot(outlier.shape = 21, outlier.size = 2, alpha = 0.7) +
geom_jitter(width = 0.2, alpha = 0.3, size = 1) +
scale_fill_manual(values = c("Normal" = "steelblue", "Tumor" = "coral")) +
scale_y_continuous(labels = scales::comma) +
labs(
title = "Library Sizes par Groupe",
subtitle = "Distribution des comptages totaux par échantillon",
x = "Type d'échantillon",
y = "Comptages totaux (reads)",
fill = "Type"
) +
theme_minimal(base_size = 14) +
theme(legend.position = "none")
print(p1)# Sauvegarder
ggsave("figures/QC/library_sizes_boxplot.pdf", p1, width = 10, height = 6)
cat("Figure sauvegardée: figures/QC/library_sizes_boxplot.pdf\n")#> Figure sauvegardée: figures/QC/library_sizes_boxplot.pdf
QUESTION 2 : Observez-vous une différence dans les
comptages totaux entre tumeur et normal?
Qu’est-ce que cela pourrait indiquer?
# Density plot
p2 <- ggplot(df_lib_sizes, aes(x = Library_Size, fill = Sample_Type)) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c("Normal" = "steelblue", "Tumor" = "coral")) +
scale_x_continuous(labels = scales::comma) +
labs(
title = "Distribution des Library Sizes",
x = "Comptages totaux (reads)",
y = "Densité",
fill = "Type"
) +
theme_minimal(base_size = 14)
print(p2)# Compter gènes avec au moins 1 read par échantillon
genes_detected <- colSums(counts_matrix > 0)
df_genes <- data.frame(
Sample = names(genes_detected),
Genes_Detected = genes_detected,
Sample_Type = metadata$condition
)
# Statistiques
cat("=== GÈNES DÉTECTÉS ===\n")#> === GÈNES DÉTECTÉS ===
#> Minimum: 15495 gènes
#> Maximum: 16811 gènes
#> Moyenne: 16318 gènes
#>
#> === PAR GROUPE ===
p3 <- ggplot(df_genes, aes(x = Sample_Type, y = Genes_Detected, fill = Sample_Type)) +
geom_violin(alpha = 0.7) +
geom_boxplot(width = 0.2, fill = "white", outlier.shape = NA) +
scale_fill_manual(values = c("Normal" = "steelblue", "Tumor" = "coral")) +
labs(
title = "Gènes Détectés par Échantillon",
x = "Type d'échantillon",
y = "Nombre de gènes détectés (counts > 0)",
fill = "Type"
) +
theme_minimal(base_size = 14) +
theme(legend.position = "none")
print(p3)️ POURQUOI FILTRER?
Les gènes avec très peu de reads : - N’ont pas assez de puissance statistique - Augmentent le nombre de tests multiples (pénalité FDR) - Peuvent être du bruit technique
Critère standard : Garder les gènes avec ≥10 reads
dans ≥X échantillons
(où X = plus petit groupe, ici les normaux)
# Nombre minimum d'échantillons (taille du plus petit groupe)
min_samples <- min(table(metadata$condition))
cat("=== FILTRAGE DES GÈNES ===\n")#> === FILTRAGE DES GÈNES ===
#> Critère: ≥10 reads dans au moins 41 échantillons
#> Avant filtrage: 16892 gènes
# Appliquer le filtre
keep <- rowSums(counts_matrix >= 10) >= min_samples
counts_filtered <- counts_matrix[keep, ]
cat("Après filtrage:", nrow(counts_filtered), "gènes\n")#> Après filtrage: 16023 gènes
cat("Gènes supprimés:", nrow(counts_matrix) - nrow(counts_filtered),
"(", round((1 - nrow(counts_filtered)/nrow(counts_matrix))*100, 1), "%)\n")#> Gènes supprimés: 869 ( 5.1 %)
# Visualiser l'effet du filtrage
df_filter <- data.frame(
Category = c("Avant filtrage", "Après filtrage"),
N_Genes = c(nrow(counts_matrix), nrow(counts_filtered))
)
barplot(df_filter$N_Genes,
names.arg = df_filter$Category,
col = c("lightgray", "steelblue"),
main = "Effet du Filtrage",
ylab = "Nombre de gènes",
las = 1)#> === CRÉATION DE L'OBJET DESeq2 ===
# Vérifier que sample_type est un facteur
if(!is.factor(metadata$condition)) {
metadata$condition <- factor(metadata$condition)
}
# Définir le niveau de référence (Normal en premier)
metadata$sample_type <- relevel(metadata$condition, ref = "Normal")
cat("Niveaux du facteur sample_type:", levels(metadata$condition), "\n")#> Niveaux du facteur sample_type: Normal Tumor
#> Niveau de référence: Normal
# Créer l'objet DESeqDataSet
dds <- DESeqDataSetFromMatrix(
countData = counts_filtered,
colData = metadata,
design = ~ condition
)
cat("Objet DESeqDataSet créé!\n\n")#> Objet DESeqDataSet créé!
#> === INFORMATIONS OBJET DDS ===
#> class: DESeqDataSet
#> dim: 16023 327
#> metadata(1): version
#> assays(1): counts
#> rownames(16023): ARHGEF10L HIF3A ... SELP SELS
#> rowData names(0):
#> colnames(327): TCGA-CA-5256-01 TCGA-AZ-6599-01 ... TCGA-CA-5796-01
#> TCGA-G4-6626-01
#> colData names(5): sample_id barcode condition patient_id sample_type
** POURQUOI VST?**
RNA-seq a une variance dépendant de la moyenne : - Gènes peu exprimés → variance élevée - Gènes très exprimés → variance faible
VST stabilise la variance pour permettre : - PCA - Clustering hiérarchique - Heatmaps - Visualisations
IMPORTANT : VST est pour VISUALISATION
uniquement!
L’analyse différentielle utilise les comptages BRUTS.
#> === TRANSFORMATION VST ===
#> Cette étape peut prendre quelques minutes...
# VST avec blind=TRUE (ignore le design expérimental)
vst_data <- vst(dds, blind = TRUE)
cat("Transformation VST terminée!\n\n")#> Transformation VST terminée!
# Extraire la matrice transformée
vst_counts <- assay(vst_data)
cat("Dimensions données VST:", dim(vst_counts), "\n")#> Dimensions données VST: 16023 327
#> Range des valeurs VST: 5.39 20.43
QUESTION 3a : Pourquoi utilise-t-on une transformation VST au lieu des comptages bruts pour la PCA?
QUESTION 3b : Que signifie le paramètre
blind = TRUE?
#> === ANALYSE PCA ===
# Calculer la PCA (transposer car PCA veut échantillons en lignes)
pca_result <- prcomp(t(vst_counts), center = TRUE, scale. = FALSE)
# Variance expliquée
var_explained <- round(100 * (pca_result$sdev^2 / sum(pca_result$sdev^2)), 2)
cat("Variance expliquée par les 10 premières PC:\n")#> Variance expliquée par les 10 premières PC:
#> [1] 20.26 11.06 6.99 3.30 2.89 2.48 2.12 1.96 1.69 1.41
#> PC1 + PC2: 31.32 %
#> PC1 + PC2 + PC3: 38.31 %
# Créer dataframe pour ggplot
pca_data <- data.frame(
Sample = rownames(pca_result$x),
PC1 = pca_result$x[, 1],
PC2 = pca_result$x[, 2],
PC3 = pca_result$x[, 3],
PC4 = pca_result$x[, 4],
Sample_Type = metadata$condition,
Library_Size = colSums(counts_filtered)
)# Plot PC1 vs PC2
p_pca <- ggplot(pca_data, aes(x = PC1, y = PC2, color = Sample_Type, shape = Sample_Type)) +
geom_point(size = 4, alpha = 0.7) +
scale_color_manual(values = c("Normal" = "steelblue", "Tumor" = "coral")) +
scale_shape_manual(values = c("Normal" = 16, "Tumor" = 17)) +
labs(
title = "PCA: Tumeurs vs Tissus Normaux",
subtitle = paste0("PC1 (", var_explained[1], "%) vs PC2 (", var_explained[2], "%)"),
x = paste0("PC1 (", var_explained[1], "% de variance)"),
y = paste0("PC2 (", var_explained[2], "% de variance)"),
color = "Type d'échantillon",
shape = "Type d'échantillon"
) +
theme_minimal(base_size = 14) +
theme(
legend.position = "right",
panel.grid.major = element_line(color = "gray90"),
panel.border = element_rect(color = "black", fill = NA)
) +
# Ajouter ellipses de confiance
stat_ellipse(aes(fill = Sample_Type), alpha = 0.2, geom = "polygon", level = 0.95) +
scale_fill_manual(values = c("Normal" = "steelblue", "Tumor" = "coral"), guide = "none")
print(p_pca)ggsave("figures/QC/PCA_PC1_PC2.pdf", p_pca, width = 12, height = 10)
cat("Figure sauvegardée: figures/QC/PCA_PC1_PC2.pdf\n")#> Figure sauvegardée: figures/QC/PCA_PC1_PC2.pdf
QUESTION 4a : Les tumeurs et les normaux sont-ils bien séparés sur la PCA?
QUESTION 4b : Sur quelle composante principale observe-t-on la meilleure séparation?
QUESTION 4c : Que représente la variance expliquée par PC1 et PC2?
QUESTION 4d : Y a-t-il des échantillons outliers visibles?
# PC1 vs PC3
p_pc1_pc3 <- ggplot(pca_data, aes(x = PC1, y = PC3, color = Sample_Type)) +
geom_point(size = 3, alpha = 0.7) +
scale_color_manual(values = c("Normal" = "steelblue", "Tumor" = "coral")) +
labs(
title = paste0("PC1 (", var_explained[1], "%) vs PC3 (", var_explained[3], "%)"),
x = paste0("PC1 (", var_explained[1], "%)"),
y = paste0("PC3 (", var_explained[3], "%)")
) +
theme_minimal()
# PC2 vs PC3
p_pc2_pc3 <- ggplot(pca_data, aes(x = PC2, y = PC3, color = Sample_Type)) +
geom_point(size = 3, alpha = 0.7) +
scale_color_manual(values = c("Normal" = "steelblue", "Tumor" = "coral")) +
labs(
title = paste0("PC2 (", var_explained[2], "%) vs PC3 (", var_explained[3], "%)"),
x = paste0("PC2 (", var_explained[2], "%)"),
y = paste0("PC3 (", var_explained[3], "%)")
) +
theme_minimal()
# Combiner avec patchwork
p_combined_pca <- p_pc1_pc3 / p_pc2_pc3 +
plot_layout(guides = "collect") &
theme(legend.position = "bottom")
print(p_combined_pca)# Dataframe pour scree plot
df_scree <- data.frame(
PC = paste0("PC", 1:20),
Variance = var_explained[1:20],
PC_num = 1:20
)
# Scree plot
p_scree <- ggplot(df_scree, aes(x = PC_num, y = Variance)) +
geom_line(color = "steelblue", size = 1) +
geom_point(color = "steelblue", size = 3) +
geom_text(aes(label = paste0(round(Variance, 1), "%")),
vjust = -0.5, size = 3) +
scale_x_continuous(breaks = 1:20, labels = df_scree$PC) +
labs(
title = "Scree Plot: Variance Expliquée par Composante Principale",
x = "Composante Principale",
y = "Variance Expliquée (%)"
) +
theme_minimal(base_size = 14) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(p_scree)#> === CALCUL MATRICE DE DISTANCE ===
# Calculer distances euclidiennes entre échantillons
sample_dists <- dist(t(vst_counts), method = "euclidean")
# Convertir en matrice pour heatmap
sample_dist_matrix <- as.matrix(sample_dists)
cat("Dimensions matrice de distance:", dim(sample_dist_matrix), "\n")#> Dimensions matrice de distance: 327 327
#> Distance minimum: 34.36
#> Distance maximum: 218.46
# Annotations pour la heatmap
annotation_col <- data.frame(
Type = metadata$condition,
row.names = rownames(metadata)
)
# Couleurs annotations
annotation_colors <- list(
Type = c(Normal = "steelblue", Tumor = "coral")
)
# Heatmap avec pheatmap
p_dist <- pheatmap(
sample_dist_matrix,
clustering_distance_rows = sample_dists,
clustering_distance_cols = sample_dists,
annotation_col = annotation_col,
annotation_row = annotation_col,
annotation_colors = annotation_colors,
color = viridis(100, option = "C"),
main = "Distances Entre Échantillons (VST)",
fontsize = 8,
show_rownames = FALSE,
show_colnames = FALSE
)
p_distpdf("figures/QC/sample_distance_heatmap.pdf", width = 12, height = 12)
grid::grid.draw(p_dist$gtable)
dev.off()#> png
#> 2
#> Heatmap sauvegardée: figures/QC/sample_distance_heatmap.pdf
library(dendextend)
# Factoriser les conditions
metadata$condition <- factor(metadata$condition, levels = c("Normal", "Tumor"))
group_colors <- c("Normal" = "steelblue", "Tumor" = "coral")
# Séparer les matrices par condition
counts_normal <- counts_matrix[, metadata$condition == "Normal"]
counts_tumor <- counts_matrix[, metadata$condition == "Tumor"]
metadata_normal <- metadata[metadata$condition == "Normal", ]
metadata_tumor <- metadata[metadata$condition == "Tumor", ]
# ========================================
# Dendrogramme Normal
# ========================================
sample_dists_normal <- dist(t(counts_normal))
hc_normal <- hclust(sample_dists_normal, method = "ward.D2")
dend_normal <- as.dendrogram(hc_normal)
leaf_colors_normal <- rep(group_colors["Normal"], ncol(counts_normal))
labels_colors(dend_normal) <- leaf_colors_normal
# Affichage dans le notebook
par(mar = c(5,4,4,2))
plot(dend_normal,
main = "Dendrogramme - Normal",
ylab = "Distance",
leaflab = "none")
colored_bars(colors = leaf_colors_normal, dend = dend_normal, rowLabels = "Condition")pdf("figures/QC/dendrogram_normal.pdf", width = 14, height = 8)
par(mar = c(5,4,4,2))
plot(dend_normal,
main = "Dendrogramme - Normal",
ylab = "Distance",
leaflab = "none")
colored_bars(colors = leaf_colors_normal, dend = dend_normal, rowLabels = "Condition")
dev.off()#> png
#> 2
#> Dendrogramme Normal sauvegardé: figures/QC/dendrogram_normal.pdf
# ========================================
# Dendrogramme Tumor
# ========================================
sample_dists_tumor <- dist(t(counts_tumor))
hc_tumor <- hclust(sample_dists_tumor, method = "ward.D2")
dend_tumor <- as.dendrogram(hc_tumor)
leaf_colors_tumor <- rep(group_colors["Tumor"], ncol(counts_tumor))
labels_colors(dend_tumor) <- leaf_colors_tumor
# Affichage
par(mar = c(5,4,4,2))
plot(dend_tumor,
main = "Dendrogramme - Tumor",
ylab = "Distance",
leaflab = "none")
colored_bars(colors = leaf_colors_tumor, dend = dend_tumor, rowLabels = "Condition")pdf("figures/QC/dendrogram_tumor.pdf", width = 14, height = 8)
par(mar = c(5,4,4,2))
plot(dend_tumor,
main = "Dendrogramme - Tumor",
ylab = "Distance",
leaflab = "none")
colored_bars(colors = leaf_colors_tumor, dend = dend_tumor, rowLabels = "Condition")
dev.off()#> png
#> 2
#> Dendrogramme Tumor sauvegardé: figures/QC/dendrogram_tumor.pdf
QUESTION 5a : Les échantillons tumoraux se regroupent-ils ensemble?
QUESTION 5b : Même question pour les normaux?
QUESTION 5c : Identifiez-vous des sous-groupes parmi les tumeurs?
QUESTION 5d : Comment expliquez-vous cette hétérogénéité?
ÉTAPE CRITIQUE
DESeq2 effectue 3 étapes : 1. Estimation des facteurs de taille (normalisation) 2. Estimation de la dispersion (variance par gène) 3. Tests statistiques (Wald test ou LRT)
Cette étape peut prendre 5-15 minutes selon la taille du dataset et votre ordinateur.
#> === EXÉCUTION DE DESeq2 ===
#> Cette étape peut prendre plusieurs minutes...
#> Annalyse DESeq2 terminée!
#> Coefficients du modèle:
#> [1] "Intercept" "condition_Tumor_vs_Normal"
#> === EXTRACTION DES RÉSULTATS ===
# Extraire les résultats (comparaison Tumor vs Normal)
res <- results(dds,
contrast = c("condition", "Tumor", "Normal"),
alpha = 0.05) # Seuil FDR
cat("Comparaison effectuée: Tumor vs Normal\n")#> Comparaison effectuée: Tumor vs Normal
#> Seuil FDR (alpha): 0.05
#> === RÉSUMÉ DES RÉSULTATS ===
#>
#> out of 16023 with nonzero total read count
#> adjusted p-value < 0.05
#> LFC > 0 (up) : 6414, 40%
#> LFC < 0 (down) : 5951, 37%
#> outliers [1] : 0, 0%
#> low counts [2] : 0, 0%
#> (mean count < 3)
#> [1] see 'cooksCutoff' argument of ?results
#> [2] see 'independentFiltering' argument of ?results
# Ordonner par p-value ajustée
res_ordered <- res[order(res$padj), ]
cat("\n=== TOP 10 GÈNES (padj le plus faible) ===\n")#>
#> === TOP 10 GÈNES (padj le plus faible) ===
#> === FILTRAGE GÈNES DIFFÉRENTIELLEMENT EXPRIMÉS ===
# Critères: padj < 0.05 ET |log2FC| > 1
threshold_padj <- 0.05
threshold_log2fc <- 1
# Gènes significatifs
res_sig <- res[which(res$padj < threshold_padj & abs(res$log2FoldChange) > threshold_log2fc), ]
res_sig <- res_sig[!is.na(res_sig$padj), ] # Enlever les NA
cat("Critères:\n")#> Critères:
#> - padj < 0.05
#> - |log2FC| > 1
#> Gènes différentiellement exprimés: 4316
# Sur-exprimés vs sous-exprimés
up_regulated <- res_sig[res_sig$log2FoldChange > threshold_log2fc, ]
down_regulated <- res_sig[res_sig$log2FoldChange < -threshold_log2fc, ]
cat("\nSur-exprimés (Tumor > Normal):", nrow(up_regulated), "\n")#>
#> Sur-exprimés (Tumor > Normal): 1994
#> Sous-exprimés (Tumor < Normal): 2322
# Ratio
ratio <- round(nrow(up_regulated) / nrow(down_regulated), 2)
cat("\nRatio Up/Down:", ratio, "\n")#>
#> Ratio Up/Down: 0.86
QUESTION 6a : Combien de gènes sont différentiellement exprimés au total?
QUESTION 6b : Observe-t-on plus de gènes sur-exprimés ou sous-exprimés?
QUESTION 6c : Que suggère ce déséquilibre sur la biologie du cancer colorectal?
# Convertir en dataframe
res_df <- as.data.frame(res)
res_df$gene <- rownames(res_df)
# Ajouter annotations (symboles de gènes)
library(AnnotationDbi)
library(org.Hs.eg.db)
# Mapper Ensembl IDs vers symboles
res_df$symbol <- mapIds(org.Hs.eg.db,
keys = res_df$gene,
column = "SYMBOL",
keytype = "SYMBOL",
multiVals = "first")
# Ajouter nom complet du gène
res_df$genename <- mapIds(org.Hs.eg.db,
keys = res_df$gene,
column = "GENENAME",
keytype = "SYMBOL",
multiVals = "first")
# Aperçu
cat("=== DATAFRAME AVEC ANNOTATIONS ===\n")#> === DATAFRAME AVEC ANNOTATIONS ===
# Même chose pour les gènes significatifs
res_sig_df <- as.data.frame(res_sig)
res_sig_df$gene <- rownames(res_sig_df)
res_sig_df$symbol <- mapIds(org.Hs.eg.db,
keys = res_sig_df$gene,
column = "SYMBOL",
keytype = "SYMBOL",
multiVals = "first")# Sauvegarder tous les résultats
write.csv(res_df,
"results/tables/DESeq2_results_all_genes.csv",
row.names = FALSE)
# Sauvegarder les gènes significatifs
write.csv(res_sig_df,
"results/tables/DESeq2_results_significant_genes.csv",
row.names = FALSE)
# Sauvegarder top up/down
write.csv(as.data.frame(up_regulated),
"results/tables/DESeq2_upregulated_genes.csv",
row.names = TRUE)
write.csv(as.data.frame(down_regulated),
"results/tables/DESeq2_downregulated_genes.csv",
row.names = TRUE)
cat("Résultats sauvegardés dans results/tables/\n")#> Résultats sauvegardés dans results/tables/
MA PLOT
Permet de voir si il y a un biais lié au niveau d’expression.
# Préparer les données
res_df <- as.data.frame(res) %>%
mutate(
regulation = case_when(
padj < 0.05 & log2FoldChange > 1 ~ "Up",
padj < 0.05 & log2FoldChange < -1 ~ "Down",
TRUE ~ "NS"
)
)
# Créer le MA plot
p_ma <- ggplot(res_df, aes(x = baseMean, y = log2FoldChange, color = regulation)) +
geom_point(alpha = 0.4, size = 0.8) +
scale_x_log10(
labels = scales::scientific,
breaks = scales::trans_breaks("log10", function(x) 10^x)
) +
scale_color_manual(
values = c("Up" = "#E41A1C", "Down" = "#377EB8", "NS" = "grey60"),
labels = c(
"Up" = paste0("Sur-exprimés (", nrow(up_regulated), ")"),
"Down" = paste0("Sous-exprimés (", nrow(down_regulated), ")"),
"NS" = "Non significatif"
),
name = "Régulation"
) +
geom_hline(yintercept = c(-1, 1), linetype = "dashed",
color = "black", size = 0.5) +
geom_hline(yintercept = 0, linetype = "solid",
color = "black", size = 0.3) +
labs(
title = "MA Plot - Analyse Différentielle",
subtitle = "Cancer Colorectal TCGA-COAD (FDR < 0.05, |log2FC| > 1)",
x = "Expression Moyenne (log10)",
y = "Log2 Fold Change (Tumor / Normal)"
) +
theme_minimal(base_size = 12) +
theme(
legend.position = "right",
plot.title = element_text(face = "bold", size = 14),
axis.title = element_text(face = "bold"),
panel.grid.minor = element_blank()
)
print(p_ma)QUESTION 7a : Les gènes très fortement
différentiellement exprimés ont-ils tendance
à être fortement ou faiblement exprimés?
QUESTION 7b : Pourquoi y a-t-il un “entonnoir”
(variance plus faible)
pour les gènes très exprimés?
QUESTION 7c : Identifiez-vous un biais technique potentiel?
#> === VOLCANO PLOT ===
# S'assurer que les p-values ne sont pas nulles
res$padj[res$padj == 0] <- min(res$padj[res$padj > 0]) * 1e-1
# Top gènes à annoter : up et down
top20_up <- res[order(res$log2FoldChange, decreasing = TRUE), ][1:20, ]
top20_down <- res[order(res$log2FoldChange, decreasing = FALSE), ][1:20, ]
genes_to_label <- c(
rownames(top20_up)[1:8], # Top 8 up
rownames(top20_down)[1:8] # Top 8 down
)
# Volcano plot avec EnhancedVolcano
p_volcano <- EnhancedVolcano(
res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'padj',
selectLab = genes_to_label,
title = 'Volcano Plot - Analyse Différentielle',
subtitle = 'Cancer Colorectal TCGA-COAD (Tumor vs Normal)',
pCutoff = 0.05,
FCcutoff = 1,
pointSize = 2.5,
labSize = 5.0,
labCol = 'black',
labFace = 'bold',
boxedLabels = TRUE,
parseLabels = FALSE,
col = c('grey30', 'forestgreen', 'royalblue', 'red2'), # couleurs par catégorie
colAlpha = 0.6,
legendLabels = c(
'NS',
expression(log[2]~FC),
'p-value',
expression(p-value~and~log[2]~FC)
),
legendPosition = 'right',
legendLabSize = 12,
legendIconSize = 4.0,
drawConnectors = TRUE,
widthConnectors = 0.5,
colConnectors = 'grey30',
arrowheads = FALSE,
gridlines.major = TRUE,
gridlines.minor = FALSE,
border = 'partial',
borderWidth = 1.5,
borderColour = 'black',
max.overlaps = 20
)
print(p_volcano)# Sauvegarder en PDF
ggsave("figures/DEG/volcano_plot.pdf", p_volcano, width = 14, height = 10)
cat("✓ Volcano plot sauvegardé : figures/DEG/volcano_plot.pdf\n")#> ✓ Volcano plot sauvegardé : figures/DEG/volcano_plot.pdf
QUESTION 8a : Identifiez le gène avec le log2FC le plus élevé. Quelle est sa fonction?
QUESTION 8b : Identifiez le gène avec la p-value ajustée la plus faible.
QUESTION 8c : Quels gènes se situent dans le “coin
supérieur droit”
(fort FC + très significatif)?
QUESTION 8d : Ces gènes sont-ils cohérents avec la biologie du CRC?
# Sélectionner top 50 gènes
top_genes <- head(rownames(res_sig), 50)
# Extraire comptages VST
top_counts <- vst_counts[top_genes, ]
# Z-score (centrer et réduire par gène)
top_counts_scaled <- t(scale(t(top_counts)))
# Remplacer Ensembl IDs par symboles pour labels
gene_labels <- res_df$symbol[match(rownames(top_counts_scaled), res_df$gene)]
gene_labels[is.na(gene_labels)] <- rownames(top_counts_scaled)[is.na(gene_labels)]
rownames(top_counts_scaled) <- gene_labels
# Annotations colonnes
annotation_col <- data.frame(
Type = metadata$condition,
row.names = colnames(top_counts_scaled)
)
annotation_colors <- list(
Type = c(Normal = "steelblue", Tumor = "coral")
)
pdf("figures/DEG/heatmap_top50_genes.pdf", width = 12, height = 14)
pheatmap(
top_counts_scaled,
annotation_col = annotation_col,
annotation_colors = annotation_colors,
color = colorRampPalette(c("blue", "white", "red"))(100),
scale = "none",
clustering_distance_rows = "euclidean",
clustering_distance_cols = "euclidean",
clustering_method = "ward.D2",
show_rownames = TRUE,
show_colnames = FALSE,
fontsize_row = 8,
main = "Top 50 Gènes Différentiellement Exprimés (Z-score)"
)#> pdf
#> 3
QUESTION 9a : Observe-t-on un regroupement clair Tumor vs Normal?
QUESTION 9b : Certains gènes ont-ils des patterns d’expression particuliers?
QUESTION 9c : Y a-t-il des sous-groupes de tumeurs visibles?
QUESTION 9d : Comment interpréter les valeurs du Z-score (bleu vs rouge)?
# Top 25 up et top 25 down
up_top <- head(rownames(up_regulated[order(up_regulated$log2FoldChange, decreasing = TRUE), ]), 25)
down_top <- head(rownames(down_regulated[order(down_regulated$log2FoldChange, decreasing = FALSE), ]), 25)
selected_genes <- c(up_top, down_top)
# Extraire comptages
selected_counts <- vst_counts[selected_genes, ]
selected_counts_scaled <- t(scale(t(selected_counts)))
# Labels
gene_labels <- res_df$symbol[match(rownames(selected_counts_scaled), res_df$gene)]
gene_labels[is.na(gene_labels)] <- rownames(selected_counts_scaled)[is.na(gene_labels)]
rownames(selected_counts_scaled) <- gene_labels
pdf("figures/DEG/heatmap_top_up_down.pdf", width = 14, height = 16)
pheatmap(
selected_counts_scaled,
annotation_col = annotation_col,
annotation_colors = annotation_colors,
color = colorRampPalette(c("blue", "white", "red"))(100),
scale = "none",
clustering_distance_rows = "euclidean",
clustering_distance_cols = "euclidean",
clustering_method = "ward.D2",
show_rownames = TRUE,
show_colnames = FALSE,
fontsize_row = 7,
main = "Top 25 Up et Top 25 Down Regulated Genes"
)#> pdf
#> 3
# Top 5 gènes up‑régulés
top5_up <- head(rownames(up_regulated[order(up_regulated$log2FoldChange, decreasing = TRUE), ]), 5)
# Top 5 gènes down‑régulés
top5_down <- head(rownames(down_regulated[order(down_regulated$log2FoldChange, decreasing = FALSE), ]), 5)
plot_gene_counts <- function(gene_id, dds, metadata) {
# Extraire comptages normalisés
counts_norm <- counts(dds, normalized = TRUE)[gene_id, ]
# Récupérer symbole du gène
symbol <- res_df$symbol[res_df$gene == gene_id]
if(length(symbol) == 0 || is.na(symbol)) symbol <- gene_id
# log2FC et padj
log2fc <- res[gene_id, "log2FoldChange"]
padj <- res[gene_id, "padj"]
padj_label <- ifelse(
is.na(padj),
"NA",
ifelse(padj < 1e-10,
"< 1e-10",
formatC(padj, format = "e", digits = 2))
)
# Créer dataframe pour ggplot
df_plot <- data.frame(
Sample = names(counts_norm),
Counts = counts_norm,
Type = metadata$condition
)
# Créer plot
p <- ggplot(df_plot, aes(x = Type, y = Counts, fill = Type)) +
geom_boxplot(outlier.shape = NA, alpha = 0.7) +
geom_jitter(width = 0.2, size = 1, alpha = 0.5) +
scale_fill_manual(values = c("Normal" = "steelblue", "Tumor" = "coral")) +
labs(
title = paste0(symbol, " (", gene_id, ")"),
subtitle = paste0("log2FC = ", round(log2fc, 2),
", padj = ", padj_label),
y = "Normalized Counts"
) +
theme_minimal() +
theme(legend.position = "none")
return(p)
}
# top5 up
plots_up <- lapply(top5_up, plot_gene_counts, dds = dds, metadata = metadata)
combined_up <- wrap_plots(plots_up, ncol = 3)
print(combined_up)ggsave("figures/DEG/count_plots_top5_upregulated.pdf", combined_up, width = 14, height = 10)
cat("✅ Top 5 up‑regulated saved: figures/DEG/count_plots_top5_upregulated.pdf\n")#> ✅ Top 5 up‑regulated saved: figures/DEG/count_plots_top5_upregulated.pdf
# -------------------------------
# Créer plots pour top5 down
# -------------------------------
plots_down <- lapply(top5_down, plot_gene_counts, dds = dds, metadata = metadata)
combined_down <- wrap_plots(plots_down, ncol = 3)
print(combined_down)ggsave("figures/DEG/count_plots_top5_downregulated.pdf", combined_down, width = 14, height = 10)
cat("✅ Top 5 down‑regulated saved: figures/DEG/count_plots_top5_downregulated.pdf\n")#> ✅ Top 5 down‑regulated saved: figures/DEG/count_plots_top5_downregulated.pdf
#> === PRÉPARATION DES LISTES DE GÈNES ===
#> Exemple de rownames(res_sig): HIF3A RTN4RL2 CAMK4 RNF112 SPN LRRTM1
# Utiliser SYMBOL comme keytype
genes_all <- rownames(res_sig)
genes_entrez <- mapIds(org.Hs.eg.db,
keys = genes_all,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first")
# Enlever NA
genes_entrez <- genes_entrez[!is.na(genes_entrez)]
cat("Gènes significatifs (SYMBOL):", length(genes_all), "\n")#> Gènes significatifs (SYMBOL): 4316
#> Gènes avec Entrez ID: 3757
# Liste UP
genes_up <- rownames(up_regulated)
genes_up_entrez <- mapIds(org.Hs.eg.db,
keys = genes_up,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first")
genes_up_entrez <- genes_up_entrez[!is.na(genes_up_entrez)]
cat("\nGènes UP (Entrez):", length(genes_up_entrez), "\n")#>
#> Gènes UP (Entrez): 1702
# Liste DOWN
genes_down <- rownames(down_regulated)
genes_down_entrez <- mapIds(org.Hs.eg.db,
keys = genes_down,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first")
genes_down_entrez <- genes_down_entrez[!is.na(genes_down_entrez)]
cat("Gènes DOWN (Entrez):", length(genes_down_entrez), "\n")#> Gènes DOWN (Entrez): 2055
#> === GO ENRICHMENT: BIOLOGICAL PROCESS ===
# Over-Representation Analysis (ORA)
ego_bp <- enrichGO(
gene = genes_entrez,
OrgDb = org.Hs.eg.db,
ont = "BP", # Biological Process
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
qvalueCutoff = 0.05,
readable = TRUE # Convertir en symboles
)
cat("Termes GO-BP enrichis:", nrow(ego_bp), "\n\n")#> Termes GO-BP enrichis: 1520
if(nrow(ego_bp) > 0) {
# Aperçu
cat("Top 10 termes GO-BP:\n")
print(head(as.data.frame(ego_bp), 10))
# Dotplot
p_go_bp <- dotplot(ego_bp, showCategory = 20) +
ggtitle("GO Biological Process Enrichment") +
theme(axis.text.y = element_text(size = 10))
print(p_go_bp)
ggsave("figures/enrichment/GO_BP_dotplot.pdf", p_go_bp, width = 12, height = 10)
# Barplot
p_go_bp_bar <- barplot(ego_bp, showCategory = 15) +
ggtitle("Top 15 GO-BP Terms")
print(p_go_bp_bar)
ggsave("figures/enrichment/GO_BP_barplot.pdf", p_go_bp_bar, width = 12, height = 10)
# Sauvegarder résultats
write.csv(as.data.frame(ego_bp),
"results/enrichment/GO_BP_enrichment.csv",
row.names = FALSE)
}#> Top 10 termes GO-BP:
#> ID Description
#> GO:0006936 GO:0006936 muscle contraction
#> GO:0003012 GO:0003012 muscle system process
#> GO:0003018 GO:0003018 vascular process in circulatory system
#> GO:0030198 GO:0030198 extracellular matrix organization
#> GO:0043062 GO:0043062 extracellular structure organization
#> GO:0045229 GO:0045229 external encapsulating structure organization
#> GO:0050900 GO:0050900 leukocyte migration
#> GO:0034765 GO:0034765 regulation of monoatomic ion transmembrane transport
#> GO:0006935 GO:0006935 chemotaxis
#> GO:0042330 GO:0042330 taxis
#> GeneRatio BgRatio pvalue
#> GO:0006936 143/3449 351/18870 0.00000000000000000000003718072
#> GO:0003012 168/3449 452/18870 0.00000000000000000000070843054
#> GO:0003018 113/3449 269/18870 0.00000000000000000008550623035
#> GO:0030198 125/3449 321/18870 0.00000000000000000177632895889
#> GO:0043062 125/3449 322/18870 0.00000000000000000238938120244
#> GO:0045229 125/3449 323/18870 0.00000000000000000320769496563
#> GO:0050900 140/3449 396/18870 0.00000000000000027147730096656
#> GO:0034765 153/3449 454/18870 0.00000000000000138541385842614
#> GO:0006935 155/3449 468/18870 0.00000000000000494992456710502
#> GO:0042330 155/3449 470/18870 0.00000000000000747246729934852
#> p.adjust qvalue
#> GO:0006936 0.0000000000000000002307436 0.0000000000000000001585856
#> GO:0003012 0.0000000000000000021982600 0.0000000000000000015108213
#> GO:0003018 0.0000000000000001768838885 0.0000000000000001215688580
#> GO:0030198 0.0000000000000027559743797 0.0000000000000018941276162
#> GO:0043062 0.0000000000000029656999485 0.0000000000000020382679226
#> GO:0045229 0.0000000000000033178258261 0.0000000000000022802771931
#> GO:0050900 0.0000000000002406840185426 0.0000000000001654174471453
#> GO:0034765 0.0000000000010747348006741 0.0000000000007386443360977
#> GO:0006935 0.0000000000034132479848282 0.0000000000023458589878257
#> GO:0042330 0.0000000000046374132059757 0.0000000000031872039470484
#> geneID
#> GO:0006936 ATP2A1/ITGA2/ATP1A2/SPHK1/CLIC2/TMOD2/CLCN1/DMD/SRI/NMU/PTGS2/SCN2B/SGCA/GJC1/AKAP9/TRPV4/ADRA2C/CCDC78/FKBP1B/TNNI3K/EDNRA/EDNRB/PPP1R13L/ARHGAP42/TNNT1/CAV1/KCNA5/STC1/DES/SCN3B/DTNA/DMPK/SLMAP/KIT/GPD1L/STAC/PIK3CG/TPM2/CALCA/F2R/GNAO1/CKMT2/RGS2/MYH11/TNNI3/CACNB2/CACNB1/OXTR/SCN1B/PLCE1/CHRM2/HTR1D/SCNN1B/SNTB1/PPP1R12B/SCN7A/FGF13/NOS1/JSRP1/SLC9A1/CNN1/TRPM4/ADRA2A/SLC8A3/CACNA2D1/ATP2B4/CHRNA1/CHRNA3/TACR2/TACR1/UTS2/CACNA1H/CACNA1C/CACNA1D/TMOD1/KCNJ3/RCSD1/SSPN/P2RX1/P2RX4/TBX20/ADRA1B/MYOCD/COMP/GSTM2/DSC2/GRIP2/PRKG1/LMOD1/LMOD3/TPM1/NEDD4L/CALD1/DRD2/APBB1/GATA4/TNNT2/SSTR2/LTB4R/MYOT/KCND3/SMTN/PDE5A/SCN11A/BDKRB2/HOMER1/CD38/ADRB2/FXYD1/SMPX/SULF1/HRC/SLC8A1/SCN4A/SCN4B/ACTA2/PLN/PTGER3/ANK2/TNNC1/GSN/TIFAB/SLC6A8/CASQ2/HTR7/KCNMA1/FLNA/NMUR1/MYL6/MYL9/MYLK/SYNM/CHRNB2/CHRNB4/CHGA/PDE4D/MAP2K6/ABCC9/RYR1/RYR3/EDN3/EDN2/CRYAB
#> GO:0003012 ATP2A1/ITGA2/ATP1A2/SPHK1/CLIC2/TMOD2/CLCN1/DMD/SRI/NMU/PTGS2/SCN2B/SGCA/GJC1/IL6ST/AKAP6/AKAP9/TRPV4/ADRA2C/CCDC78/FKBP1B/KLF15/TNNI3K/EDNRA/EDNRB/FBXO32/PPP1R13L/PI16/ARHGAP42/TNNT1/CAV1/MEIS1/KCNA5/STC1/DES/SCN3B/DTNA/DMPK/LMCD1/SLMAP/SLN/KIT/GPD1L/STAC/AGT/PDE9A/PIK3CG/TPM2/CALCA/F2R/GNAO1/CKMT2/RGS2/MYH11/TRIM72/TNNI3/CACNB2/CACNB1/OXTR/SCN1B/PLCE1/SORBS2/MIR17HG/CHRM2/HTR1D/SCNN1B/SNTB1/PPP1R12B/NR4A3/SCN7A/FGF13/HMOX1/MYOC/NOS1/JSRP1/SLC9A1/CNN1/TRPM4/ADRA2A/SLC8A3/CACNA2D1/ATP2B4/CHRNA1/CHRNA3/TACR2/TACR1/UTS2/IGF1/CACNA1H/CACNA1C/CACNA1D/TMOD1/KCNJ3/RCSD1/SSPN/P2RX1/P2RX4/TBX20/ADRA1B/MYOCD/IL1B/COMP/GSTM2/DSC2/GRIP2/PRKG1/LMOD1/LMOD3/TPM1/NEDD4L/CALD1/P2RY1/DRD2/APBB1/GATA4/TNNT2/SSTR2/LTB4R/MYOT/KCND3/SMTN/PDE5A/SCN11A/TIAM1/BDKRB2/HOMER1/CD38/ADRB2/FXYD1/SMPX/SULF1/CAMK2B/HRC/GTF2IRD1/SLC8A1/SCN4A/SCN4B/ACTA2/PLN/PTGER3/ANK2/TNNC1/EZH2/GSN/TIFAB/SLC6A8/CASQ2/HTR7/KCNMA1/FLNA/HAND2/NMUR1/MYL6/MYL9/MYLK/SYNM/CHRNB2/CHRNB4/CHGA/PDE4D/MAP2K6/ABCC8/ABCC9/RYR1/RYR3/EDN3/EDN2/CRYAB
#> GO:0003018 SLC12A2/ATP2A3/ATP1A2/PTGS2/GPR4/ABCG2/SLC6A20/TRPV4/ADRA2C/ANGPT1/HBB/SLC22A3/EDNRA/EDNRB/CYSLTR1/ECE1/ARHGAP42/CAV1/KCNA5/KAT2B/CLDN5/FOXC1/AGT/SLC7A5/TJP3/CALCA/F2R/RGS2/PTP4A3/BMP6/SLC1A2/SLC2A13/OXTR/SLC7A1/PDE3A/FERMT2/HTR1D/SCNN1B/SERPINF2/P2RY2/SLC6A1/SLC4A4/SLC4A8/NOS1/FGA/FGB/ATP1B2/ABCB1/TRPM4/ADRA2A/SLC8A2/SLC28A2/ATP2B4/TACR2/TACR1/UTS2/SLC15A2/KCNMB1/P2RX1/ADRA1B/ADRA1D/COMP/GRIP2/ATP8A1/PRKG1/LEPR/P2RY1/C2CD4A/C2CD4B/DRD5/SLC29A1/ADORA2A/TEK/ZEB2/BDKRB2/F2RL1/CD38/CD36/ADRB1/ADRB2/NPR1/KLF2/SLIT2/SLC2A1/SLC1A5/CEACAM1/PLOD3/KNG1/SLC8A1/PDE2A/ACTA2/VEGFA/AKAP12/SLC38A5/SLC38A3/SLC5A6/SLC6A6/SH3GL2/HTR7/KCNMA1/SLC6A9/SLC1A1/SLC22A5/AGTR1/ABCC8/ABCC9/ABCC1/ABCC2/EDN3/EDN2/CPS1/SLC2A4/SLC13A3
#> GO:0030198 COL7A1/ITGA8/COL4A5/COL4A1/DPT/LOX/BMP2/COL14A1/MMP25/MATN3/MATN2/FKBP10/PDPN/RECK/PRSS1/TNFRSF11B/COL10A1/COL28A1/CAV1/COL4A6/SERPINB5/SMOC1/COL11A1/COL11A2/VIT/ZNF469/FOXC1/FOXF1/FOXF2/AGT/COL23A1/GAS2/FLRT2/FSCN1/COL1A2/MYH11/COL27A1/RUNX1/ADAMTSL2/ADAMTSL3/ADAMTSL1/ADAMTSL4/ADAMTSL5/CTSG/POSTN/FGFR4/CTSS/ADAMTS2/ADAMTS8/PTX3/FERMT1/FBLN1/FBLN5/MMP8/MMP9/MMP7/MMP3/SERPINF2/FAP/COL9A2/COL9A3/COL9A1/MMP1/MFAP4/COL22A1/ITGB3/MMP28/COL5A2/COL5A3/COL5A1/COL8A1/KLK7/SOX9/MAD2L2/DDR2/COL1A1/COL17A1/WDR72/TGFBI/SMPD3/COL12A1/COMP/IBSP/HPSE2/ANGPTL7/MMP14/MMP10/MMP11/MMP12/MMP13/PRDX4/PDGFRA/TPSAB1/GPM6B/LAMA1/SERPINH1/LOXL2/LOXL4/ABI3BP/AXIN2/ADAMTS12/ADAMTS14/ADAMTS18/SPINK5/HPN/C6orf15/WT1/ADAM8/EGFL6/SULF1/COL3A1/PLOD3/TLL1/CCDC80/OLFML2B/OLFML2A/NTN4/ADAMTS4/ADAMTS7/ADAMTS6/ADAMTS1/COL24A1/TNXB/CMA1/APLP1
#> GO:0043062 COL7A1/ITGA8/COL4A5/COL4A1/DPT/LOX/BMP2/COL14A1/MMP25/MATN3/MATN2/FKBP10/PDPN/RECK/PRSS1/TNFRSF11B/COL10A1/COL28A1/CAV1/COL4A6/SERPINB5/SMOC1/COL11A1/COL11A2/VIT/ZNF469/FOXC1/FOXF1/FOXF2/AGT/COL23A1/GAS2/FLRT2/FSCN1/COL1A2/MYH11/COL27A1/RUNX1/ADAMTSL2/ADAMTSL3/ADAMTSL1/ADAMTSL4/ADAMTSL5/CTSG/POSTN/FGFR4/CTSS/ADAMTS2/ADAMTS8/PTX3/FERMT1/FBLN1/FBLN5/MMP8/MMP9/MMP7/MMP3/SERPINF2/FAP/COL9A2/COL9A3/COL9A1/MMP1/MFAP4/COL22A1/ITGB3/MMP28/COL5A2/COL5A3/COL5A1/COL8A1/KLK7/SOX9/MAD2L2/DDR2/COL1A1/COL17A1/WDR72/TGFBI/SMPD3/COL12A1/COMP/IBSP/HPSE2/ANGPTL7/MMP14/MMP10/MMP11/MMP12/MMP13/PRDX4/PDGFRA/TPSAB1/GPM6B/LAMA1/SERPINH1/LOXL2/LOXL4/ABI3BP/AXIN2/ADAMTS12/ADAMTS14/ADAMTS18/SPINK5/HPN/C6orf15/WT1/ADAM8/EGFL6/SULF1/COL3A1/PLOD3/TLL1/CCDC80/OLFML2B/OLFML2A/NTN4/ADAMTS4/ADAMTS7/ADAMTS6/ADAMTS1/COL24A1/TNXB/CMA1/APLP1
#> GO:0045229 COL7A1/ITGA8/COL4A5/COL4A1/DPT/LOX/BMP2/COL14A1/MMP25/MATN3/MATN2/FKBP10/PDPN/RECK/PRSS1/TNFRSF11B/COL10A1/COL28A1/CAV1/COL4A6/SERPINB5/SMOC1/COL11A1/COL11A2/VIT/ZNF469/FOXC1/FOXF1/FOXF2/AGT/COL23A1/GAS2/FLRT2/FSCN1/COL1A2/MYH11/COL27A1/RUNX1/ADAMTSL2/ADAMTSL3/ADAMTSL1/ADAMTSL4/ADAMTSL5/CTSG/POSTN/FGFR4/CTSS/ADAMTS2/ADAMTS8/PTX3/FERMT1/FBLN1/FBLN5/MMP8/MMP9/MMP7/MMP3/SERPINF2/FAP/COL9A2/COL9A3/COL9A1/MMP1/MFAP4/COL22A1/ITGB3/MMP28/COL5A2/COL5A3/COL5A1/COL8A1/KLK7/SOX9/MAD2L2/DDR2/COL1A1/COL17A1/WDR72/TGFBI/SMPD3/COL12A1/COMP/IBSP/HPSE2/ANGPTL7/MMP14/MMP10/MMP11/MMP12/MMP13/PRDX4/PDGFRA/TPSAB1/GPM6B/LAMA1/SERPINH1/LOXL2/LOXL4/ABI3BP/AXIN2/ADAMTS12/ADAMTS14/ADAMTS18/SPINK5/HPN/C6orf15/WT1/ADAM8/EGFL6/SULF1/COL3A1/PLOD3/TLL1/CCDC80/OLFML2B/OLFML2A/NTN4/ADAMTS4/ADAMTS7/ADAMTS6/ADAMTS1/COL24A1/TNXB/CMA1/APLP1
#> GO:0050900 SPN/SLC12A2/ITGA2/ITGA7/CHST4/ITGAL/GPR15/RAC3/ZP3/LGALS3/BMP5/ADD2/CCL28/CCL21/CCL20/CCL23/CCL25/CCL24/CCL26/CD200R1/TRPV4/RARRES2/PODXL2/IL6R/TBX21/CNR2/GP1BA/SAA1/EDNRB/IL23A/CX3CR1/PTPRO/F7/MSMP/ROR2/KIT/MCOLN2/PF4/WASL/AIF1/GPR183/PIK3CG/JAM3/CD177/CSF1R/XCL2/CALCA/CXCL11/CXCL10/CXCL13/CXCL12/CXCL17/JAM2/BSG/NCKAP1L/MIF/CTSG/CXCL1/CXCL3/CXCL2/CXCL5/CXCL6/SCG2/VAV1/PGF/SLAMF1/THBS4/SPNS2/TNFSF11/HMOX1/ITGB3/ITGB7/LBP/MMP28/HCK/CNN2/TRPM4/CCR2/CCR7/XCL1/PDGFD/TACR1/CMKLR1/LYVE1/PTN/P2RX4/CCL3/CCL8/PADI2/TRPM2/SMPD3/IL1B/IL10/IL16/IL1A/MAPK3/CCL11/CCL13/CCL14/CCL15/CCL19/THY1/MMP14/P2RY12/IL33/PTK2B/CSF1/NOD2/WNT5A/DPEP1/SERPINE1/IL17A/ASCL2/BST1/BDKRB1/ASB2/F2RL1/CCL5/MADCAM1/PPBP/ADAM8/TNFRSF11A/KLRK1/SLIT2/DUSP1/VEGFA/PTGER4/CXCR5/TREM1/TREM2/NKX2-3/RIPK3/STAP1/GPR18/TNFAIP6/CHGA/KITLG/EDN3/EDN2/SELE
#> GO:0034765 ATP2A1/GRP/ATP1A2/HAMP/CLIC5/CLIC2/CLIC3/P2RX7/CLCN2/CLCN1/DMD/SRI/LRRC26/SCN2B/CHD7/KCNK5/GRIN2B/GRIN2D/AKAP5/AKAP6/AKAP9/KCNAB1/HVCN1/FKBP1B/AHNAK/PCSK9/EDNRA/CABP4/TESC/CAV1/KCNA5/KCNA3/BCL2/SCN3A/SCN3B/STIM1/PLP1/SLMAP/SLN/GPD1L/HPCA/STAC/PIK3CG/KCNIP4/KCNIP3/CHP2/KCNG1/KCNG3/F2R/NTSR1/KCNQ4/CACNB2/CACNB1/CXCL11/CXCL10/HCN1/SCN1B/CTSS/TMEM37/GRIN2A/VAMP2/JPH2/JPH3/JPH1/JPH4/NLGN3/MMP9/HECW1/HECW2/SCN7A/FGF13/KCNJ11/KCNJ15/KCNJ14/NOS1/ITGB3/ATP1B2/SCN9A/STOM/ABCB1/JSRP1/SLC9A1/TRPM5/ADRA2A/CCR2/CACNA2D1/ATP2B4/CACNG4/XCL1/TLR9/KCNH4/KCNH1/KCNH8/RELN/NPSR1/CACNA1H/CACNA1A/CACNA1C/CACNA1D/CACNA1E/SLC31A2/KCNJ3/KCNMB1/P2RX1/P2RX4/LRRC55/GSTM2/MS4A1/KCNN4/OSR1/THY1/PIRT/KCNK10/PTK2B/TRPC6/NEDD4L/PDZK1/DRD2/KCNB1/F2/KCND3/SCN11A/BDKRB1/HOMER1/KCNK15/ADRB2/FXYD6/FXYD5/FXYD2/FXYD3/FXYD1/CD19/HRC/SNCA/DPP6/SLC8A1/SCN4A/SCN4B/KCNQ5/PLN/ANK2/ANK3/DPP10/TREM2/EPHB2/FHL1/KCNS3/CASQ2/KCNMA1/FLNA/SLC6A9/PDE4D/EDN3
#> GO:0006935 SPN/SLC12A2/ITGA2/RAC3/GAB1/LGALS3/LOX/SEMA6D/NTRK3/BMP4/CCL28/CCL21/CCL20/CCL23/CCL25/CCL24/CCL26/TRPV4/RARRES2/FGFR1/ANGPT2/ANGPT1/IL6R/UNC5C/WNT3/CNR2/SAA1/LGR6/EDNRB/IL23A/CYSLTR1/CX3CR1/PTPRO/F7/SERPIND1/CMTM8/MSMP/CMTM7/KIT/L1CAM/FOSL1/PF4/AIF1/GPR183/PIK3CG/JAM3/CSF1R/XCL2/CALCA/FLRT2/CXCL11/CXCL10/CXCL13/CXCL12/CXCL17/NTN1/BSG/NCKAP1L/MIF/LPAR1/ROBO2/CTSG/CXCL1/CXCL3/CXCL2/CXCL5/CXCL6/SCG2/VAV1/LEF1/PGF/NRG1/SEMA4G/SLAMF1/PLD1/DOCK2/EPHA7/FGF18/SEMA3E/SEMA3D/SEMA3G/THBS4/TNFSF11/LSP1/ITGB3/LBP/MMP28/TRPM4/CCR2/CCR3/CCR7/CCR8/CCR9/EFNA5/XCL1/PDGFD/BIN2/CMKLR1/PTN/P2RX4/CCL3/CCL8/PADI2/TRPM2/IL1B/IL10/IL16/MAPK3/CCL11/CCL13/CCL14/CCL15/CCL19/P2RY12/PDGFRA/PTK2B/CSF1/NOD2/MET/WNT5A/DPEP1/SERPINE1/BST1/TUBB2B/TIAM1/SLIT3/F2RL1/CCL5/PPBP/CCR10/ADAM8/TNFRSF11A/SEMA3B/KLRK1/SLIT2/SEMA6A/FGF7/FGF2/DEFB1/DUSP1/VEGFA/CXCR5/TREM1/TREM2/STAP1/EPHB1/GPR18/TNFAIP6/AGTR1/CHGA/ABCC1/EDN3/EDN2/ENPP2/PLAU
#> GO:0042330 SPN/SLC12A2/ITGA2/RAC3/GAB1/LGALS3/LOX/SEMA6D/NTRK3/BMP4/CCL28/CCL21/CCL20/CCL23/CCL25/CCL24/CCL26/TRPV4/RARRES2/FGFR1/ANGPT2/ANGPT1/IL6R/UNC5C/WNT3/CNR2/SAA1/LGR6/EDNRB/IL23A/CYSLTR1/CX3CR1/PTPRO/F7/SERPIND1/CMTM8/MSMP/CMTM7/KIT/L1CAM/FOSL1/PF4/AIF1/GPR183/PIK3CG/JAM3/CSF1R/XCL2/CALCA/FLRT2/CXCL11/CXCL10/CXCL13/CXCL12/CXCL17/NTN1/BSG/NCKAP1L/MIF/LPAR1/ROBO2/CTSG/CXCL1/CXCL3/CXCL2/CXCL5/CXCL6/SCG2/VAV1/LEF1/PGF/NRG1/SEMA4G/SLAMF1/PLD1/DOCK2/EPHA7/FGF18/SEMA3E/SEMA3D/SEMA3G/THBS4/TNFSF11/LSP1/ITGB3/LBP/MMP28/TRPM4/CCR2/CCR3/CCR7/CCR8/CCR9/EFNA5/XCL1/PDGFD/BIN2/CMKLR1/PTN/P2RX4/CCL3/CCL8/PADI2/TRPM2/IL1B/IL10/IL16/MAPK3/CCL11/CCL13/CCL14/CCL15/CCL19/P2RY12/PDGFRA/PTK2B/CSF1/NOD2/MET/WNT5A/DPEP1/SERPINE1/BST1/TUBB2B/TIAM1/SLIT3/F2RL1/CCL5/PPBP/CCR10/ADAM8/TNFRSF11A/SEMA3B/KLRK1/SLIT2/SEMA6A/FGF7/FGF2/DEFB1/DUSP1/VEGFA/CXCR5/TREM1/TREM2/STAP1/EPHB1/GPR18/TNFAIP6/AGTR1/CHGA/ABCC1/EDN3/EDN2/ENPP2/PLAU
#> Count
#> GO:0006936 143
#> GO:0003012 168
#> GO:0003018 113
#> GO:0030198 125
#> GO:0043062 125
#> GO:0045229 125
#> GO:0050900 140
#> GO:0034765 153
#> GO:0006935 155
#> GO:0042330 155
QUESTION 10a : Quels sont les 3 processus biologiques les plus enrichis?
QUESTION 10b : Ces processus sont-ils cohérents avec les hallmarks du cancer?
QUESTION 10c : Identifiez-vous des processus liés à l’immunité/inflammation?
QUESTION 10d : Quel est le “GeneRatio” et que signifie-t-il?
#> === GO ENRICHMENT: GÈNES UP-RÉGULÉS ===
ego_bp_up <- enrichGO(
gene = genes_up_entrez,
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
readable = TRUE
)
cat("Termes GO-BP (UP):", nrow(ego_bp_up), "\n\n")#> Termes GO-BP (UP): 666
if(nrow(ego_bp_up) > 0) {
p_go_up <- dotplot(ego_bp_up, showCategory = 15) +
ggtitle("GO-BP: Gènes Sur-exprimés (Tumor)")
print(p_go_up)
ggsave("figures/enrichment/GO_BP_upregulated.pdf", p_go_up, width = 12, height = 10)
}#> === GO ENRICHMENT: GÈNES DOWN-RÉGULÉS ===
ego_bp_down <- enrichGO(
gene = genes_down_entrez,
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
readable = TRUE
)
cat("Termes GO-BP (DOWN):", nrow(ego_bp_down), "\n\n")#> Termes GO-BP (DOWN): 1116
if(nrow(ego_bp_down) > 0) {
p_go_down <- dotplot(ego_bp_down, showCategory = 15) +
ggtitle("GO-BP: Gènes Sous-exprimés (Normal)")
print(p_go_down)
ggsave("figures/enrichment/GO_BP_downregulated.pdf", p_go_down, width = 12, height = 10)
}# Combiner
if(nrow(ego_bp_up) > 0 && nrow(ego_bp_down) > 0) {
p_combined_go <- p_go_up / p_go_down
ggsave("figures/enrichment/GO_BP_up_vs_down.pdf", p_combined_go, width = 14, height = 16)
}#> === KEGG PATHWAY ENRICHMENT ===
kegg_enrich <- enrichKEGG(
gene = genes_entrez,
organism = "hsa", # Homo sapiens
pvalueCutoff = 0.05,
pAdjustMethod = "BH"
)
cat("Pathways KEGG enrichis:", nrow(kegg_enrich), "\n\n")#> Pathways KEGG enrichis: 84
if(nrow(kegg_enrich) > 0) {
# Aperçu
cat("Top 10 pathways KEGG:\n")
print(head(as.data.frame(kegg_enrich), 10))
# Dotplot
p_kegg <- dotplot(kegg_enrich, showCategory = 20) +
ggtitle("KEGG Pathway Enrichment")
print(p_kegg)
ggsave("figures/enrichment/KEGG_dotplot.pdf", p_kegg, width = 12, height = 10)
# Barplot
p_kegg_bar <- barplot(kegg_enrich, showCategory = 15) +
ggtitle("Top 15 KEGG Pathways")
print(p_kegg_bar)
ggsave("figures/enrichment/KEGG_barplot.pdf", p_kegg_bar, width = 12, height = 10)
# Sauvegarder
write.csv(as.data.frame(kegg_enrich),
"results/enrichment/KEGG_enrichment.csv",
row.names = FALSE)
}#> Top 10 pathways KEGG:
#> category
#> hsa04060 Environmental Information Processing
#> hsa04020 Environmental Information Processing
#> hsa04820 <NA>
#> hsa04061 Environmental Information Processing
#> hsa04974 Organismal Systems
#> hsa04082 <NA>
#> hsa04080 Environmental Information Processing
#> hsa04610 Organismal Systems
#> hsa04024 Environmental Information Processing
#> hsa04062 Organismal Systems
#> subcategory ID
#> hsa04060 Signaling molecules and interaction hsa04060
#> hsa04020 Signal transduction hsa04020
#> hsa04820 <NA> hsa04820
#> hsa04061 Signaling molecules and interaction hsa04061
#> hsa04974 Digestive system hsa04974
#> hsa04082 <NA> hsa04082
#> hsa04080 Signaling molecules and interaction hsa04080
#> hsa04610 Immune system hsa04610
#> hsa04024 Signal transduction hsa04024
#> hsa04062 Immune system hsa04062
#> Description
#> hsa04060 Cytokine-cytokine receptor interaction
#> hsa04020 Calcium signaling pathway
#> hsa04820 Cytoskeleton in muscle cells
#> hsa04061 Viral protein interaction with cytokine and cytokine receptor
#> hsa04974 Protein digestion and absorption
#> hsa04082 Neuroactive ligand signaling
#> hsa04080 Neuroactive ligand-receptor interaction
#> hsa04610 Complement and coagulation cascades
#> hsa04024 cAMP signaling pathway
#> hsa04062 Chemokine signaling pathway
#> GeneRatio BgRatio pvalue p.adjust
#> hsa04060 116/1831 298/9561 0.0000000000000006052336 0.0000000000002094108
#> hsa04020 98/1831 254/9561 0.0000000000002265597081 0.0000000000391948295
#> hsa04820 90/1831 233/9561 0.0000000000020533106846 0.0000000002368151656
#> hsa04061 49/1831 100/9561 0.0000000000126451529765 0.0000000010938057325
#> hsa04974 50/1831 105/9561 0.0000000000297901806092 0.0000000020614804982
#> hsa04082 77/1831 199/9561 0.0000000000751464583088 0.0000000043334457625
#> hsa04080 118/1831 370/9561 0.0000000017180013396440 0.0000000849183519310
#> hsa04610 40/1831 88/9561 0.0000000151007312592004 0.0000006531066269604
#> hsa04024 78/1831 226/9561 0.0000000255496027151809 0.0000009822402821614
#> hsa04062 68/1831 193/9561 0.0000000810568480985331 0.0000028045669442092
#> qvalue
#> hsa04060 0.0000000000001471673
#> hsa04020 0.0000000000275448908
#> hsa04820 0.0000000001664262344
#> hsa04061 0.0000000007686921941
#> hsa04974 0.0000000014487435202
#> hsa04082 0.0000000030454090999
#> hsa04080 0.0000000596779412718
#> hsa04610 0.0000004589827527467
#> hsa04024 0.0000006902875119540
#> hsa04062 0.0000019709612537643
#> geneID
#> hsa04060 3587/4838/655/653/652/651/650/8743/3572/56477/6366/6364/6368/6370/6369/10344/3976/3570/8807/4982/9966/51561/11009/3598/85480/1524/1439/10913/55504/5196/3588/1436/6846/654/939/6373/3627/10563/6387/284340/3557/2919/2921/2920/6374/6372/94/9518/2690/51330/355/268/8600/7292/729230/1232/1236/1237/10803/6375/1271/8744/84957/656/7850/6348/6355/115650/3553/3586/3603/4050/356/3552/6356/6357/6358/6359/6363/3589/3953/90865/8200/1435/1437/1440/3590/8794/8795/130399/768211/3605/27189/9173/8808/970/4804/5008/3977/959/6352/5473/2826/53832/8792/8742/3625/83729/8771/643/23495/608/10148/285613/3624/920
#> hsa04020 814/487/489/8877/5027/4916/5579/5567/2904/2769/2906/2767/2260/84812/2774/1909/1910/57105/10800/5260/108/109/115/6786/6588/255231/810/2149/4923/113026/5333/5021/5137/5153/5136/51196/5582/2264/2263/5255/5737/2903/4485/1129/5332/9965/8817/4842/3706/6547/6543/493/1139/80310/6865/6869/8912/773/775/776/777/3708/5023/5025/147/146/116443/5156/2185/4233/1816/135/623/624/952/153/154/815/816/3270/2254/2252/2247/6546/7422/5350/5733/7134/26281/845/3363/3360/5336/4638/185/2925/6261/6263
#> hsa04820 8516/3673/3679/1287/1282/477/29767/1756/6442/57644/23500/51332/57731/7138/1288/84823/1674/4703/1837/1301/1302/7169/375790/1158/1634/1278/4629/7137/85301/56776/171024/22989/2199/2192/127294/8736/6641/1298/1299/1297/7060/3690/3695/3696/482/1290/50509/1289/22801/7111/1277/8082/1311/1824/25802/56203/6712/7168/81624/7139/9260/64236/27295/284217/9499/165904/4892/84665/271/270/57159/10529/200539/1281/2318/72/8910/287/288/7134/11155/7058/255631/114793/2273/23345/1465/633/10398/1462
#> hsa04061 3587/8743/3572/56477/6366/6364/6368/6370/6369/10344/3570/8807/11009/1524/5196/3588/1436/6846/6373/3627/10563/6387/2919/2921/2920/6374/6372/729230/1232/1236/1237/10803/6375/6348/6355/3586/6356/6357/6358/6359/6363/1435/8794/8795/6352/5473/2826/53832/643
#> hsa04974 1294/1287/1282/206358/477/7512/8645/7373/4224/4225/5644/1300/340267/1288/11136/1301/1302/91522/6519/1278/85301/4311/6550/1298/1299/1297/169044/482/1290/50509/1289/6547/6543/1295/6564/1277/6520/1308/1303/3783/1358/1359/486/1281/6510/6546/255631/81578/6505/340024
#> hsa04082 5368/5579/5567/2904/2906/2767/6571/57030/152/2774/2791/55970/51764/111/108/109/115/2568/2564/2563/246213/2788/2786/2785/54331/2775/6506/5582/5179/2903/1132/1129/3352/5332/60482/5029/6529/63910/2557/2892/2893/150/1139/1136/6865/6869/5023/5025/147/146/116443/2898/2899/2901/9934/64805/53829/2770/5028/1816/1813/285242/3359/135/5443/153/154/2918/2914/140/3363/3360/6536/6505/1141/1143/2560
#> hsa04080 2922/5368/5027/10874/4828/2904/2906/4543/286530/152/6755/7425/5644/1268/1269/1909/1910/57105/10800/7433/2908/27346/27334/7068/2568/2564/2563/7432/183/283869/796/2149/4923/7434/5729/5021/1902/1511/5737/5179/2903/3001/2690/66004/1132/1129/3352/4157/9340/5029/6751/2557/2892/2893/150/1138/1139/1134/1136/6865/6869/10911/6750/387129/4887/5023/5025/147/146/116443/3814/2898/2899/2901/8862/4886/9934/53829/3953/5028/1816/1813/6752/1241/2147/130576/135/5443/623/624/90226/114131/2641/2151/2150/153/154/2918/2914/53637/3827/10022/5734/5733/5697/6019/140/3363/3360/10316/185/5745/1141/1143/2560/2925/1908/1907
#> hsa04610 3687/3075/3426/629/5104/1675/5055/2155/3053/1380/1378/2149/2162/2/5345/2243/2244/733/717/730/729/4179/10544/5054/5270/2147/2153/714/713/712/11326/1604/623/624/2151/5648/2161/1191/3827/5328
#> hsa04024 814/487/489/477/5881/5567/2904/2906/5906/6755/51/1909/111/108/109/115/64399/7432/810/2149/7434/7137/5602/5021/51196/5139/3776/7409/2903/1129/3352/5337/268/84699/6751/482/2892/2893/6548/493/2737/6662/6750/775/776/10846/116443/5595/4886/2770/1259/1261/1816/1813/6752/64091/64208/135/5443/7074/2641/11149/153/154/486/5348/4881/815/816/5350/5143/5733/627/3360/10398/5144/1908/1907
#> hsa04062 5881/5579/5567/56477/6366/6364/6368/6370/6369/10344/5906/653361/2791/55970/51764/6655/111/108/109/115/1524/2788/2786/2785/5196/54331/5294/6846/6373/3627/10563/6387/2919/2921/2920/6374/6372/7409/1794/5332/3055/729230/1232/1236/1237/10803/23533/6375/408/6348/6355/53358/5595/6356/6357/6358/6359/6363/2185/2770/7074/6352/5473/2826/7454/10235/643/5336
#> Count
#> hsa04060 116
#> hsa04020 98
#> hsa04820 90
#> hsa04061 49
#> hsa04974 50
#> hsa04082 77
#> hsa04080 118
#> hsa04610 40
#> hsa04024 78
#> hsa04062 68
QUESTION 11a : Y a-t-il une voie enrichie pour le cancer ?
QUESTION 11b : Quels autres pathways cancéreux sont détectés? Le pathway hsa05210 est-il enrichi?
QUESTION 11c : Identifiez-vous des voies de
signalisation majeures
(Wnt, MAPK, PI3K…)?
QUESTION 11d : Comment interprétez-vous la présence de “Pathways in cancer”?
** GSEA vs ORA**
ORA (Over-Representation Analysis) : - Utilise une liste de gènes significatifs (seuil arbitraire) - Test hypergéométrique - Perd l’information sur les gènes “presque” significatifs
GSEA (Gene Set Enrichment Analysis) : - Utilise TOUS les gènes classés par fold change - Pas de seuil arbitraire - Plus de puissance statistique - Détecte des effets faibles mais coordonnés
#> === GENE SET ENRICHMENT ANALYSIS (GSEA) ===
# Préparer la liste rangée
# Tous les gènes avec leur log2FC
gene_list_full <- res$log2FoldChange
names(gene_list_full) <- rownames(res)
# Enlever NA
gene_list_full <- gene_list_full[!is.na(gene_list_full)]
# Convertir en Entrez IDs
gene_list_entrez <- mapIds(org.Hs.eg.db,
keys = names(gene_list_full),
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first")
# Associer log2FC aux Entrez IDs
gene_list_gsea <- gene_list_full[!is.na(gene_list_entrez)]
names(gene_list_gsea) <- gene_list_entrez[!is.na(gene_list_entrez)]
# Ordonner par log2FC décroissant
gene_list_gsea <- sort(gene_list_gsea, decreasing = TRUE)
cat("Gènes dans la liste GSEA:", length(gene_list_gsea), "\n")#> Gènes dans la liste GSEA: 13749
#> Range log2FC: -8.89 10.38
# GSEA sur KEGG
gsea_kegg <- gseKEGG(
geneList = gene_list_gsea,
organism = "hsa",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
verbose = FALSE
)
cat("Pathways GSEA significatifs:", nrow(gsea_kegg), "\n\n")#> Pathways GSEA significatifs: 85
if(nrow(gsea_kegg) > 0) {
# Aperçu
cat("Top pathways GSEA:\n")
print(head(as.data.frame(gsea_kegg)[, c("Description", "NES", "pvalue", "p.adjust")], 10))
# Dotplot GSEA
p_gsea <- dotplot(gsea_kegg, showCategory = 20, split = ".sign") +
facet_grid(.~.sign) +
ggtitle("GSEA: KEGG Pathways")
print(p_gsea)
ggsave("figures/enrichment/GSEA_KEGG_dotplot.pdf", p_gsea, width = 14, height = 10)
# Sauvegarder
write.csv(as.data.frame(gsea_kegg),
"results/enrichment/GSEA_KEGG.csv",
row.names = FALSE)
}#> Top pathways GSEA:
#> Description NES pvalue
#> hsa04382 Cornified envelope formation 2.415821 0.0000000001000
#> hsa04110 Cell cycle 2.104312 0.0000000208105
#> hsa03008 Ribosome biogenesis in eukaryotes 2.252782 0.0000001297551
#> hsa04022 cGMP-PKG signaling pathway -1.967218 0.0000002595848
#> hsa04082 Neuroactive ligand signaling -1.921698 0.0000009087986
#> hsa03013 Nucleocytoplasmic transport 2.039804 0.0000012881096
#> hsa04024 cAMP signaling pathway -1.865874 0.0000011502711
#> hsa04725 Cholinergic synapse -1.972770 0.0000026689863
#> hsa00982 Drug metabolism - cytochrome P450 -2.063371 0.0000046358938
#> hsa04517 IgSF CAM signaling -1.728426 0.0000046120435
#> p.adjust
#> hsa04382 0.000000035100
#> hsa04110 0.000003652242
#> hsa03008 0.000015181342
#> hsa04022 0.000022778566
#> hsa04082 0.000063797662
#> hsa03013 0.000064589498
#> hsa04024 0.000064589498
#> hsa04725 0.000117101773
#> hsa00982 0.000162719873
#> hsa04517 0.000162719873
QUESTION 12a : Les pathways enrichis sont-ils similaires entre ORA et GSEA?
QUESTION 12b : GSEA détecte-t-il des pathways que ORA avait manqués?
QUESTION 12c : Qu’est-ce que le NES (Normalized Enrichment Score)?
QUESTION 12d : Quelle méthode préférez-vous et pourquoi?
if(nrow(gsea_kegg) > 0) {
# Prendre les 3 pathways les plus significatifs
top3_pathways <- head(gsea_kegg$Description, 3)
for(pathway in top3_pathways) {
p_enrich <- gseaplot2(
gsea_kegg,
geneSetID = which(gsea_kegg$Description == pathway),
title = pathway,
pvalue_table = TRUE
)
print(p_enrich)
# Sauvegarder
filename <- paste0("figures/enrichment/GSEA_plot_",
gsub("[^A-Za-z0-9]", "_", pathway), ".pdf")
ggsave(filename, p_enrich, width = 12, height = 8)
}
}#> === REACTOME PATHWAY ENRICHMENT ===
reactome_enrich <- enrichPathway(
gene = genes_entrez,
organism = "human",
pvalueCutoff = 0.05,
readable = TRUE
)
cat("Pathways Reactome enrichis:", nrow(reactome_enrich), "\n\n")#> Pathways Reactome enrichis: 69
if(nrow(reactome_enrich) > 0) {
# Aperçu
cat("Top 10 pathways Reactome:\n")
print(head(as.data.frame(reactome_enrich), 10))
# Dotplot
p_reactome <- dotplot(reactome_enrich, showCategory = 20) +
ggtitle("Reactome Pathway Enrichment")
print(p_reactome)
ggsave("figures/enrichment/Reactome_dotplot.pdf", p_reactome, width = 12, height = 10)
# Sauvegarder
write.csv(as.data.frame(reactome_enrich),
"results/enrichment/Reactome_enrichment.csv",
row.names = FALSE)
}#> Top 10 pathways Reactome:
#> ID
#> R-HSA-1474244 R-HSA-1474244
#> R-HSA-397014 R-HSA-397014
#> R-HSA-5576891 R-HSA-5576891
#> R-HSA-373076 R-HSA-373076
#> R-HSA-1474228 R-HSA-1474228
#> R-HSA-500792 R-HSA-500792
#> R-HSA-1474290 R-HSA-1474290
#> R-HSA-2022090 R-HSA-2022090
#> R-HSA-418594 R-HSA-418594
#> R-HSA-112316 R-HSA-112316
#> Description
#> R-HSA-1474244 Extracellular matrix organization
#> R-HSA-397014 Muscle contraction
#> R-HSA-5576891 Cardiac conduction
#> R-HSA-373076 Class A/1 (Rhodopsin-like receptors)
#> R-HSA-1474228 Degradation of the extracellular matrix
#> R-HSA-500792 GPCR ligand binding
#> R-HSA-1474290 Collagen formation
#> R-HSA-2022090 Assembly of collagen fibrils and other multimeric structures
#> R-HSA-418594 G alpha (i) signalling events
#> R-HSA-112316 Neuronal System
#> GeneRatio BgRatio pvalue
#> R-HSA-1474244 128/2292 300/11009 0.000000000000000003568353
#> R-HSA-397014 83/2292 205/11009 0.000000000082367074139901
#> R-HSA-5576891 60/2292 132/11009 0.000000000151986372085149
#> R-HSA-373076 117/2292 335/11009 0.000000000901040190032301
#> R-HSA-1474228 60/2292 140/11009 0.000000002615641747219934
#> R-HSA-500792 149/2292 468/11009 0.000000007625671314125572
#> R-HSA-1474290 43/2292 90/11009 0.000000009503244728464896
#> R-HSA-2022090 33/2292 61/11009 0.000000010210468574276890
#> R-HSA-418594 109/2292 318/11009 0.000000011137069061151775
#> R-HSA-112316 133/2292 410/11009 0.000000014081124847423994
#> p.adjust qvalue
#> R-HSA-1474244 0.000000000000005402486 0.000000000000004890522
#> R-HSA-397014 0.000000062351875123905 0.000000056443121331659
#> R-HSA-5576891 0.000000076702455778972 0.000000069433774194689
#> R-HSA-373076 0.000000341043711927226 0.000000308724823005804
#> R-HSA-1474228 0.000000792016321058196 0.000000716961169448496
#> R-HSA-500792 0.000001873502506509310 0.000001695960692119253
#> R-HSA-1474290 0.000001873502506509310 0.000001695960692119253
#> R-HSA-2022090 0.000001873502506509310 0.000001695960692119253
#> R-HSA-418594 0.000001873502506509310 0.000001695960692119253
#> R-HSA-112316 0.000002131882301899993 0.000001929855215931162
#> geneID
#> R-HSA-1474244 COL7A1/ITGA8/ITGA2/ITGA7/COL4A5/COL4A1/ITGAX/ITGAL/NRXN1/DMD/LOX/BMP7/BMP4/BMP2/COL14A1/MMP25/MATN3/PRSS1/ADAM12/COL10A1/COL28A1/TIMP1/COL4A6/CAPN14/CAPN13/CAPN12/PTPRS/LTBP4/COL11A1/COL11A2/LTBP2/JAM3/CD44/COL23A1/TTR/AGRN/DCN/COL1A2/COL27A1/JAM2/BSG/HAPLN1/LAMC2/MFAP2/CTSG/CTSS/ADAMTS2/ADAMTS8/NCAM1/FBLN2/FBLN1/FBLN5/MMP8/MMP9/MMP7/MMP3/SCUBE3/SCUBE1/A2M/COL9A2/COL9A3/COL9A1/MMP1/MFAP5/MFAP4/COL22A1/ITGB3/ITGB7/ITGB8/FGA/FGB/COL5A2/COL5A3/COL5A1/ITGA11/COL8A1/KLK7/DDR2/COL1A1/ACAN/EMILIN3/SPP1/DST/COL17A1/COL12A1/COMP/IBSP/P4HA3/MMP14/MMP10/MMP11/MMP12/MMP13/GDF5/TPSAB1/LAMC3/EFEMP1/PCOLCE2/SERPINE1/LAMA1/SERPINH1/LOXL2/LOXL4/ICAM5/ICAM3/SPARC/MADCAM1/ADAMTS14/ADAMTS18/ADAM8/MUSK/COL3A1/CEACAM1/PLOD3/TLL1/FGF2/NTN4/ADAMTS4/ADAMTS1/COL24A1/COL21A1/CAPN5/CAPN2/CAPN9/TNXB/CMA1/BGN/VCAN
#> R-HSA-397014 ATP2A1/ATP2A3/ATP1A2/CLIC2/TMOD2/DMD/SRI/SCN2B/KCNK9/KCNK5/AKAP9/FKBP1B/TNNT1/KCNA5/KAT2B/DES/SCN3A/SCN3B/NEB/STIM1/DMPK/SLN/KCNIP4/KCNIP3/TPM2/MYH11/TRIM72/TNNI3/CACNB2/CACNB1/MME/SCN1B/SORBS1/KCNK2/KCNK3/GUCY1A2/SCN7A/FGF13/KCNJ11/KCNJ14/NOS1/ATP1B2/SCN9A/SLC8A3/SLC8A2/ATP2B4/CACNG4/CACNA1H/CACNA1C/TMOD1/ITPR1/KCNK10/LMOD1/TPM1/CALD1/GATA4/TNNT2/KCND3/PDE5A/SCN11A/KCNK15/FXYD6/FXYD2/FXYD3/FXYD1/NPR1/CAMK2A/CAMK2B/ACTG2/SLC8A1/SCN4A/SCN4B/ACTA2/PLN/TNNC1/CASQ2/CORIN/MYL6/MYL9/MYLK/ABCC9/RYR1/RYR3
#> R-HSA-5576891 ATP2A1/ATP2A3/ATP1A2/CLIC2/SRI/SCN2B/KCNK9/KCNK5/AKAP9/FKBP1B/KCNA5/KAT2B/SCN3A/SCN3B/STIM1/DMPK/SLN/KCNIP4/KCNIP3/TNNI3/CACNB2/CACNB1/MME/SCN1B/KCNK2/KCNK3/SCN7A/FGF13/KCNJ11/KCNJ14/NOS1/ATP1B2/SCN9A/SLC8A3/SLC8A2/ATP2B4/CACNG4/CACNA1C/ITPR1/KCNK10/GATA4/KCND3/SCN11A/KCNK15/FXYD6/FXYD2/FXYD3/FXYD1/NPR1/CAMK2A/CAMK2B/SLC8A1/SCN4A/SCN4B/PLN/CASQ2/CORIN/ABCC9/RYR1/RYR3
#> R-HSA-373076 GRP/PNOC/NMU/NMB/GPR4/CCL28/CCL21/CCL20/CCL23/CCL25/MTNR1A/ADRA2C/SSTR5/CNR1/CNR2/SAA1/EDNRA/EDNRB/CYSLTR2/CYSLTR1/ECE2/ECE1/CX3CR1/GPR143/P2RY10/PF4/GPR55/AGT/GPR183/NPW/XCL2/F2R/NTSR1/CXCL11/CXCL10/CXCL13/CXCL12/PTGDR/OXTR/LPAR1/CXCL1/CXCL3/CXCL2/CXCL5/CXCL6/PTGFR/PENK/CHRM4/CHRM2/HTR1D/MC1R/P2RY2/SSTR1/ADRA2A/CCR2/CCR3/CCR7/CCR8/CCR9/XCL1/TACR2/TACR1/UTS2/SST/NPSR1/NPY2R/CMKLR1/OPN3/ADRA1B/ADRA1D/CCL3/KISS1/APLN/CCL11/CCL13/CCL19/NPY1R/P2RY14/P2RY12/P2RY13/P2RY1/DRD5/DRD2/SSTR2/LTB4R/F2/ADORA2A/GPR37L1/POMC/BDKRB1/BDKRB2/F2RL2/F2RL1/CCL5/PPBP/ADRB1/ADRB2/CCR10/S1PR5/KNG1/INSL5/PTGER4/PTGER3/PYY/CXCR5/RLN2/ADORA3/HTR7/HTR4/GPR17/GPR18/NMUR1/OXGR1/AGTR1/GRPR/EDN3/EDN2
#> R-HSA-1474228 COL7A1/COL4A5/COL4A1/COL14A1/MMP25/PRSS1/COL10A1/TIMP1/COL4A6/CAPN14/CAPN13/CAPN12/COL11A1/COL11A2/CD44/COL23A1/DCN/COL1A2/BSG/LAMC2/CTSG/CTSS/ADAMTS8/MMP8/MMP9/MMP7/MMP3/SCUBE3/SCUBE1/A2M/COL9A2/COL9A3/COL9A1/MMP1/COL5A2/COL5A3/COL5A1/COL8A1/KLK7/COL1A1/ACAN/SPP1/COL17A1/COL12A1/MMP14/MMP10/MMP11/MMP12/MMP13/TPSAB1/ADAMTS18/ADAM8/COL3A1/TLL1/ADAMTS4/ADAMTS1/CAPN5/CAPN2/CAPN9/CMA1
#> R-HSA-500792 GRP/PNOC/NMU/NMB/GPR4/CCL28/CCL21/CCL20/CCL23/CCL25/MTNR1A/ADRA2C/SSTR5/WNT9A/TAS2R38/WNT3/WNT2/CNR1/CNR2/SAA1/EDNRA/EDNRB/GNG11/GNG12/GNG13/CYSLTR2/CYSLTR1/SHH/VIPR1/ECE2/ECE1/CX3CR1/GPR143/P2RY10/GNG7/GNG4/GNG3/PF4/GPR55/VIP/GNG2/AGT/GPR183/NPW/XCL2/CALCA/F2R/NTSR1/VIPR2/FZD3/FZD5/CXCL11/CXCL10/CXCL13/CXCL12/PTGDR/OXTR/LPAR1/CXCL1/CXCL3/CXCL2/CXCL5/CXCL6/PTGFR/PENK/CHRM4/CHRM2/HTR1D/MC1R/GLP2R/P2RY2/SSTR1/ADRA2A/CCR2/CCR3/CCR7/CCR8/CCR9/XCL1/TACR2/TACR1/UTS2/SST/NPSR1/NPY2R/CMKLR1/DHH/OPN3/ADRA1B/ADRA1D/FZD10/CCL3/KISS1/APLN/CCL11/CCL13/CCL19/NPY1R/P2RY14/P2RY12/P2RY13/WNT7B/P2RY1/DRD5/DRD2/WNT5A/SSTR2/LTB4R/F2/ADORA2A/GPR37L1/CD55/POMC/BDKRB1/BDKRB2/WNT2B/UCN2/UCN3/GCG/F2RL2/F2RL1/CCL5/WNT11/PPBP/ADRB1/ADRB2/CCR10/GRM8/GRM4/S1PR5/KNG1/INSL5/PTGER4/PTGER3/PYY/CXCR5/RLN2/ADORA3/HTR7/HTR4/GPR17/GPR18/NMUR1/OXGR1/AGTR1/PTH1R/GRPR/EDN3/EDN2
#> R-HSA-1474290 COL7A1/COL4A5/COL4A1/LOX/COL14A1/COL10A1/COL28A1/COL4A6/COL11A1/COL11A2/COL23A1/COL1A2/COL27A1/LAMC2/CTSS/ADAMTS2/MMP9/MMP7/MMP3/COL9A2/COL9A3/COL9A1/COL22A1/COL5A2/COL5A3/COL5A1/COL8A1/COL1A1/DST/COL17A1/COL12A1/P4HA3/MMP13/PCOLCE2/SERPINH1/LOXL2/LOXL4/ADAMTS14/COL3A1/PLOD3/TLL1/COL24A1/COL21A1
#> R-HSA-2022090 COL7A1/COL4A5/COL4A1/LOX/COL14A1/COL10A1/COL4A6/COL11A1/COL11A2/COL1A2/COL27A1/LAMC2/CTSS/MMP9/MMP7/MMP3/COL9A2/COL9A3/COL9A1/COL5A2/COL5A3/COL5A1/COL8A1/COL1A1/DST/COL17A1/COL12A1/MMP13/LOXL2/LOXL4/COL3A1/TLL1/COL24A1
#> R-HSA-418594 CAMK4/GPSM2/PNOC/NMU/PRKACB/GNA15/GNA11/CCL28/CCL21/CCL20/CCL23/CCL25/MTNR1A/ADRA2C/SSTR5/GNAL/TAS2R38/CNR1/CNR2/SAA1/GNG11/GNG12/GNG13/ADCY5/ADCY2/ADCY3/ADCY9/CX3CR1/GNG7/GNG4/GNG3/PF4/GPR55/GNG2/AGT/GPR183/NPW/RGS5/RGS9/CXCL11/CXCL10/CXCL13/CXCL12/PDE1C/PDE1B/PDE1A/KPNA2/LPAR1/PRKCG/PRKAR2B/CXCL1/CXCL3/CXCL2/CXCL5/CXCL6/PENK/CHRM4/CHRM2/HTR1D/PLCB4/SSTR1/ADRA2A/CCR2/CCR3/CCR7/CCR8/CCR9/SST/NPY2R/ITPR1/OPN3/GNAZ/APLN/CCL13/CCL19/NPY1R/P2RY14/P2RY12/P2RY13/GNAI1/SSTR2/GPR37L1/NBEA/POMC/BDKRB1/BDKRB2/CCL5/PPBP/CCR10/CAMK2A/CAMK2B/GRM8/GRM4/S1PR5/KNG1/INSL5/PDE4C/PTGER3/PYY/CXCR5/ADORA3/GPR17/GPR18/NMUR1/OXGR1/RGS18/PDE4D/RGS16/RGS13
#> R-HSA-112316 CAMK4/LRRTM1/LRRTM2/NRXN1/CPLX1/NTRK3/PRKCB/GJC1/PRKACB/KCNK9/GRIN2B/GRIN2D/AKAP5/SLC18A2/SLC17A7/KCNAB1/LRRC4B/SYT2/SYT7/PRKAA2/GNAL/EPB41L3/NRXN2/PTPRF/GNG11/GNG12/GNG13/ADCY5/ADCY2/ADCY3/ADCY9/KCNA5/KCNA3/PTPRS/GNG7/GNG4/GNG3/PANX2/BCHE/TUBA1A/GNG2/KCNN3/RPS6KA6/KCNG1/KCNG3/KCNQ4/CACNB2/CACNB1/SLC1A2/HCN1/NAAA/KPNA2/SYN3/PRKCG/PRKAR2B/KCNK2/KCNK3/GRIN2A/VAMP2/NLGN3/NRG1/NLGN1/SLC5A7/TSPAN7/KCNJ11/KCNJ15/KCNJ14/SLC6A1/MAOA/GABRA4/GRIA3/GRIA4/TUBB6/TUBB3/MAPT/CACNA2D1/CACNG4/CHRNA5/CHRNA7/CHRNA1/CHRNA3/KCNH4/KCNH1/KCNH8/CACNA1A/CACNA1E/SNAP25/KCNJ3/KCNMB1/DLG2/GRIN3A/GRIK2/GRIK3/GRIK5/RASGRF1/KCNN4/GRIP2/MAPK3/KCNK10/GNAI1/SYT12/KCNB1/HTR3E/HTR3A/PPFIA4/KIF17/KCND3/NEFL/NBEA/TUBB2B/HOMER1/CAMK2A/CAMK2B/APBA1/STX1A/KCNQ5/GLS2/KCNS3/GAD1/KCNMA1/STXBP1/TUBAL3/NLGN4Y/NLGN4X/SLC1A1/SYN2/CHRNB2/CHRNB4/GABRB1/SLITRK5/SLITRK3/ABCC8/ABCC9
#> Count
#> R-HSA-1474244 128
#> R-HSA-397014 83
#> R-HSA-5576891 60
#> R-HSA-373076 117
#> R-HSA-1474228 60
#> R-HSA-500792 149
#> R-HSA-1474290 43
#> R-HSA-2022090 33
#> R-HSA-418594 109
#> R-HSA-112316 133
# Créer un tableau résumé
summary_stats <- data.frame(
Statistique = c(
"Gènes testés",
"Gènes significatifs (padj < 0.05, |log2FC| > 1)",
"Sur-exprimés (Tumor > Normal)",
"Sous-exprimés (Tumor < Normal)",
"Ratio Up/Down",
"Variance expliquée PC1 (%)",
"Variance expliquée PC2 (%)",
"Termes GO-BP enrichis",
"Pathways KEGG enrichis (ORA)",
"Pathways GSEA positifs (NES > 0)",
"Pathways GSEA négatifs (NES < 0)"
),
Valeur = c(
nrow(counts_filtered),
nrow(res_sig),
nrow(up_regulated),
nrow(down_regulated),
round(nrow(up_regulated) / nrow(down_regulated), 2),
var_explained[1],
var_explained[2],
if(exists("ego_bp")) nrow(ego_bp) else 0,
if(exists("kegg_enrich")) nrow(kegg_enrich) else 0,
if(exists("gsea_kegg")) sum(gsea_kegg$NES > 0) else 0,
if(exists("gsea_kegg")) sum(gsea_kegg$NES < 0) else 0
)
)
cat("\n=== RÉSUMÉ DE L'ANALYSE ===\n\n")#>
#> === RÉSUMÉ DE L'ANALYSE ===
#> Statistique Valeur
#> 1 Gènes testés 16023.00
#> 2 Gènes significatifs (padj < 0.05, |log2FC| > 1) 4316.00
#> 3 Sur-exprimés (Tumor > Normal) 1994.00
#> 4 Sous-exprimés (Tumor < Normal) 2322.00
#> 5 Ratio Up/Down 0.86
#> 6 Variance expliquée PC1 (%) 20.26
#> 7 Variance expliquée PC2 (%) 11.06
#> 8 Termes GO-BP enrichis 1520.00
#> 9 Pathways KEGG enrichis (ORA) 84.00
#> 10 Pathways GSEA positifs (NES > 0) 25.00
#> 11 Pathways GSEA négatifs (NES < 0) 60.00
# Sauvegarder
write.csv(summary_stats,
"results/analysis_summary.csv",
row.names = FALSE)
cat("\n Résumé sauvegardé: results/analysis_summary.csv\n")#>
#> Résumé sauvegardé: results/analysis_summary.csv
#>
#> === TOP 10 GÈNES CANDIDATS POUR VALIDATION ===
# Critères:
# 1. Très significatif (padj très faible)
# 2. Fold change élevé
# 3. Expression moyenne élevée (pas un gène rare)
res_sig_df_ordered <- res_sig_df[order(res_sig_df$padj), ]
top_candidates <- head(res_sig_df_ordered, 10)
cols_to_show <- intersect(
c("symbol", "genename", "log2FoldChange", "padj", "baseMean"),
colnames(top_candidates)
)
print(top_candidates[, cols_to_show])#> symbol log2FoldChange
#> CDH3 CDH3 6.096033
#> ETV4 ETV4 5.498884
#> KRT80 KRT80 6.997234
#> FOXQ1 FOXQ1 6.211920
#> KIAA1199 KIAA1199 5.705061
#> UGP2 UGP2 -2.120057
#> CLDN1 CLDN1 4.939233
#> GRIN2D GRIN2D 4.908921
#> ESM1 ESM1 6.221027
#> BEST4 BEST4 -6.599301
#> padj
#> CDH3 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
#> ETV4 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000009143815
#> KRT80 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001720139383068
#> FOXQ1 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000046998172458780113056910077151684390628361143171787261962890625000000000000000
#> KIAA1199 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000018617065247058253654385828745887465629493817687034606933593750000000000000000000000000000000000
#> UGP2 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002449595215611143963867213368956754493410699069499969482421875000000000000000000000000000000000000000000
#> CLDN1 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000360593848461482517045624540674708669030223973095417022705078125000000000000000000000000000000000000000000
#> GRIN2D 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000021715266369409016919671956413750990577682387083768844604492187500000000000000000000000000000000000000000000000000000000
#> ESM1 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006389265246177807756021438390092725967406295239925384521484375000000000000000000000000000000000000000000000000000000000000
#> BEST4 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000458646794488251987893107086691202312067616730928421020507812500000000000000000000000000000000000000000000000000000000000000
#> baseMean
#> CDH3 1835.6656
#> ETV4 2202.9720
#> KRT80 828.4886
#> FOXQ1 974.1229
#> KIAA1199 3130.5660
#> UGP2 2851.5609
#> CLDN1 2048.0136
#> GRIN2D 681.6258
#> ESM1 153.5210
#> BEST4 226.5957
# Sauvegarder l'environnement
sink("results/session_info.txt")
cat("=== INFORMATIONS DE SESSION ===\n\n")
cat("Date d'analyse:", as.character(Sys.time()), "\n\n")
cat("Système:\n")
print(Sys.info())
cat("\n\nPackages et versions:\n")
sessionInfo()
sink()
cat("Session info sauvegardée: results/session_info.txt\n")#> Session info sauvegardée: results/session_info.txt
# Sauvegarder l'espace de travail complet
save.image("results/TD_RNAseq_COAD_workspace.RData")
cat("Workspace sauvegardé: results/TD_RNAseq_COAD_workspace.RData\n")#> Workspace sauvegardé: results/TD_RNAseq_COAD_workspace.RData
#> (Pour recharger: load('results/TD_RNAseq_COAD_workspace.RData'))
1. Validation transcriptomique - qRT-PCR sur 10-15 gènes candidats - Cohorte indépendante (n=30-50 par groupe) - Contrôles : GAPDH, ACTB, HPRT1
2. Validation protéomique - Western Blot (top 5 gènes up/down) - Immunohistochimie sur TMA (Tissue MicroArray) - Scoring semi-quantitatif H-score
3. Études fonctionnelles - siRNA/CRISPR knockdown - Tests phénotypiques : prolifération, migration, invasion - Xénogreffes souris (in vivo)
Pour aller plus loin :
Cancer Genome Atlas Network. (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature, 487(7407), 330-337. DOI: 10.1038/nature11252
Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550. DOI: 10.1186/s13059-014-0550-8
Yu, G., Wang, L. G., Han, Y., & He, Q. Y. (2012). clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS, 16(5), 284-287. DOI: 10.1089/omi.2011.0118
Subramanian, A., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS, 102(43), 15545-15550. DOI: 10.1073/pnas.0506580102
Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of cancer: the next generation. Cell, 144(5), 646-674.
Guinney, J., et al. (2015). The consensus molecular subtypes of colorectal cancer. Nature Medicine, 21(11), 1350-1356.
# Recharger l'espace de travail
load("results/TD_RNAseq_COAD_workspace.RData")
# Réinitialiser le device graphique
dev.off()
# Augmenter la mémoire (Windows)
memory.limit(size = 16000)
# Vérifier les objets en mémoire
ls()
# Taille des objets
sort(sapply(ls(), function(x) object.size(get(x))), decreasing = TRUE)
# Nettoyer la mémoire
rm(list = ls())
gc()| Statistique | Signification | Seuil |
|---|---|---|
| p-value | Probabilité d’observer par hasard | < 0.05 |
| padj (FDR) | Proportion faux positifs attendue | < 0.05 |
Exemple : - 100 gènes avec p < 0.05 → 5 faux positifs attendus - 100 gènes avec FDR < 0.05 → ~5 faux positifs parmi les découvertes
| log2FC | FC | Interprétation |
|---|---|---|
| 1 | 2× | Doublement |
| 2 | 4× | Quadruplement |
| 3 | 8× | ×8 |
| -1 | 0.5× | Moitié |
| -2 | 0.25× | Quart |
COAD : Colon Adenocarcinoma
CRC : ColorRectal Cancer
DEG : Differentially Expressed Gene
FDR : False Discovery Rate
FC : Fold Change
VST : Variance Stabilizing Transformation
PCA : Principal Component Analysis
GO : Gene Ontology
KEGG : Kyoto Encyclopedia of Genes and Genomes
ORA : Over-Representation Analysis
GSEA : Gene Set Enrichment Analysis
NES : Normalized Enrichment Score
TCGA : The Cancer Genome Atlas
MSI : MicroSatellite Instability
CMS : Consensus Molecular Subtype